The early evolution of the iJ-free process 



Tom Bohman * Peter Keevash < 



Abstract 

The if- free process, for some fixed graph H, is the random graph process defined by starting 
with an empty graph on n vertices and then adding edges one at a time, chosen uniformly 
at random subject to the constraint that no H subgraph is formed. Let G be the random 
maximal H-iree graph obtained at the end of the process. When H is strictly 2-balanced, we 
show that for some c > 0, with high probability as n — > oo, the minimum degree in G is at least 
cfl i-(» H -2)/(e ff -i) ?J ji/(e H -i) i This gives new lower bounds for the Turan numbers of certain 
bipartite graphs, such as the complete bipartite graphs K r r with r > 5. When H is a complete 
graph K s with s > 5 we show that for some C > 0, with high probability the independence 
number of G is at most Cn 2 ^ s+1 ^ (log n) 1_1 ^ eff_1 ^. This gives new lower bounds for Ramsey 
numbers R(s,t) for fixed s > 5 and t large. We also obtain new bounds for the independence 
number of G for other graphs H, including the case when H is a cycle. Our proofs use the 
differential equations method for random graph processes to analyse the evolution of the process, 
and give further information about the structure of the graphs obtained, including asymptotic 
formulae for a broad class of subgraph extension variables. 



1 Introduction 

Random graph processes provide a natural context for modeling a complex network that evolves over 
time. While there has been considerable recent interest in using such processes to model networks 
that arise in applications (see [TT] and the references therein), random graphs have long been an 
important component in the construction of sophisticated combinatorial objects (see [3]). In the 
classical Erdos-Renyi random graph model G(n,p) each pair of vertices appears as an edge with 
probability p = p(n) and these choices are mutually independent. The closely related random graph 
G(n, i) is chosen uniformly at random from the collection of all graphs with n vertices and i edges. 
These models are well understood, but distributions on graphs given by random processes in which 
there is significant dependence among the choices made in different rounds are typically much more 
difficult to analyse. For many such processes even the most basic quantities, such as the number of 
edges in the final graph, are not known (see [21 j . for example). 
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In this paper we analyse a significant portion of the initial evolution of the H-iree process, for 
some fixed graph H, defined by starting with an empty graph on n vertices and then adding edges 
one at a time, chosen uniformly at random subject to the constraint that no H subgraph is formed. 
More formally, we begin with the graph on n vertices with no edges, which we denote G(0). Now 
suppose % > and we have some graph G(i — 1). We say that a pair uv of vertices is open in G{i — 1) 
if uv is not an edge of G{i — 1) and G{i — 1) U {uv } does not contain H as a subgraph. We choose 
uv uniformly at random among all open pairs in G(i — 1) and then G{i) is obtained from G(i — 1) by 
adding the edge = uv. The process terminates when there are no open pairs, with some graph G 
on n vertices that is a maximal H-iree graph. Beside being of interest in its own right, our analysis 
of this process produces new results in Ramsey theory and the theory of Turan problems. 

Erdos, Suen and Winkler [T7] suggested this process as a means to generate an interesting prob- 
ability distribution on the collection of maximal If- free graphs, or more generally maximal graphs 
with any fixed graph property^ They obtained results on the triangle-free process and the bipartite 
process, using a differential equations method that had been previously applied by Rucihski and 
Wormald [28J to analyse the 'maximum degree aV process. Another motivation for their work was 
that their analysis of the triangle-free process led to the best lower bound on the Ramsey number 
i?(3, t) known at that time. 

Ramsey theory encompasses a variety of results expressing the informal principle that all large 
systems have some structure. It is a source of many challenging unsolved combinatorial problems 
and has applications throughout mathematics. We refer the reader to [22] for an introduction to 
the subject. The Ramsey number R(s,t) is the least number n such that any graph on n vertices 
contains a complete graph with s vertices or an independent set with t vertices. In general, very 
little is known about these numbers, even approximately. The upper bound R(3,t) = 0{t 2 /\ogt) 
was obtained by Ajtai, Komlos and Szemeredi pQ, but for many years the best known lower bound, 
due to Erdos [12] . was fl(t 2 / log 2 t). Spencer conjectured that the triangle-free process is likely to 
produce a graph that establishes a good lower bound on R(3, t) for t large; the idea being that the 
triangle-free process admits enough random edges to bring the independence number close to the 
smallest possible for a triangle-free graph. Finally, Kim [23J determined the order of magnitude, 
showing that R(3,t) = 0(t 2 /logt). His proof made use of a semi-random construction that is 
motivated (even guided) by the triangle-free process, but the question remained open as to whether 
the triangle- free process itself gives such a good construction. This was answered by Bohman [7J, who 
showed that with high probability, the graph produced by the triangle-free process has independence 
number bounded above by 0{n l l 2 log 1 / 2 n) and minimum degree bounded below by f^n 1 / 2 log 1//2 n). 
He went on to analyse the l^-free process, improving the best known lower bound on i?(4, t) to 
R(4,t) > n{t 5 / 2 /log 2 t). 

The general H-fcee process was independently studied by Osthus and Taraz [26] and by Bollobas 
and Riordan [8]. Say that a graph H is strictly 2-balanced if the number of vertices vh and edges en 
in H are both at least 3 and 

eg - 1 e K -1 
v H - 2 v K - 2 

for all proper subgraphs K of H with vk > 3. Osthus and Taraz showed that if H is strictly 2- 
1 Bollobas (personal communication) informs us that such processes were considered earlier, if not in print. 
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balanced then for some c, C > with high probability, for the H-free process G has average degree at 
least cra 1- ^* -2 )/^"" 1 ) and maximum degree at most Cn 1- ^" 2 )/^" 1 )^™) 1 /^^ -1 ) . (In fact 
they proved the average degree bound under a similar but weaker condition on H.) Wolfovitz |35j 
showed that if H is strictly 2-balanced and regular then the expected number of edges in G is at 
least era 2- ( VH_2 "( eH— ^(loglogn) 1 '^ -1 '. An immediate consequence is an improved lower bound 
for Turan numbers, which leads us to another motivation for studying the H-tree process. 

The Turan number ex(n, H) is the maximum possible number of edges in a graph on n vertices 
that does not contain an H subgraph. More generally, the theory of Turan problems concerns the 
study of combinatorial structures that have maximum size subject to not containing some fixed 
structure. We refer the reader to [18J for a survey of this subject. Turan [34J determined the value 
of ex(n, H) when H = K r is complete: the unique largest graph on n vertices with no K r subgraph 
is complete (r — l)-partite with part sizes as equal as possible. For general H, the Erdos-Stone- 
Simonovits theorem [16\ PH] gives the estimate ex(n, H) = ex(n, K r ) + o(n 2 ), where r = x{H) is the 
chromatic number of H. This gives an asymptotic formula for the Turan number when H is not 
bipartite. However, when H is bipartite it is an open problem in general to determine even the order 
of magnitude of ex(n,H). For example, when H = K r>r is complete bipartite with r > 5, for many 
years the best known lower bound was ex(n, K r ^ r ) = ri(n 2-2 /( r+1 )), a result of Erdos and Spencer 
[15j proved via a simple application of the probabilistic method. Wolfovitz's analysis of the H-iree 
process improved this to ex(n, K rr ) = J7(n 2 ~ 2 ^ r+1 - ) (loglogn) 1 ^ r ' 2 ~ 1 ^). 

1.1 Results I: Ramsey and Turan bounds 

In this paper we extend the methods from [7] to an analysis of the H-free process when H is strictly 
2-balanced, leading to new lower bounds for Ramsey and Turan numbers. We also investigate other 
properties of the process, viewing it as a model of interest in its own right, and give certain extension 
counting formulae that address a question of Spencer. In particular, we show that the graph produced 
by the H-free process is very similar to the corresponding random graph G(n, i) with respect to small 
subgraph counts, with the exception that the H-iree process produces no copies of graphs containing 
H. We begin with the Turan and Ramsey results. 

Our first theorem gives a new lower bound for the number of edges in G. In fact we have a new 
lower bound for the minimum degree, and it holds with high probability, not just in expectation. An 
immediate consequence is a lower bound for the Turan number ex(n, H ). 

Theorem 1.1 Suppose that H is a strictly 2-balanced graph with vh vertices and en edges. Then 
for some c > with high probability the minimum degree in the final graph of the H-free process is 
at least cn 1 ~( VH ~ 2 ^( eH ~ 1 \\ogn) 1 /( eH ~ 1 ^ . In particular, the Turan number satisfies 




Note that it follows immediately from Theorem 11.11 that we have 

ex(n,^ rir ) = ft (n^/^Oogn) 1 ^ 2 - 1 )) . 
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For general complete bipartite graphs K rs with r < s, the 'Zarankiewicz problem' of estimating 
ex (n, K r>s ) is a subject of special interest in extremal graph theory. A general upper bound of 
order n 2-1 / r was given by Kovari, Sos and Turan [23]. The only known asymptotic results are 
ex(n,K 2 ,r) ~ \{r - l) 1/2 n 3 / 2 (see [19]) and ex(ra,if 3i3 ) ~ ±n 5 / 3 (see and [20]). Note that the 
lower bound construction for ^3,3 also gives the best known lower bound for K^. The only other 
case when the upper bound is known to be of the correct order of magnitude is when s > (r — 1)! 
(see [3]). The known constructions are based on algebraic and geometric structures that may not 
exist for other values of the parameters r and s. However, it is widely believed that ex(n, K rjS ) for 
general r < s is on the order of n 2-1 / 7 *. 

For Ramsey numbers, we obtain the following new lower bounds. 
Theorem 1.2 For fixed s > 5 and t — > 00, the Ramsey number satisfies 

(s + l 1 s + 1 \ 

t— {\ogt) — '—\ . 

The previously best known lower bound on R(s, t) when s is fixed and t is large was R(s, t) = 
0, (jt/ log t)~2~^j , established by Spencer [31] using the Lovasz Local Lemma. Theorem 11.21 improves 

this by a multiplicative factor of (logt) 1 /^ -2 ). There is no particular reason to believe that our lower 
bound is anywhere near optimal, since the best known general upper bound is essentially t s_1 (up 
to a polylogarithmic factor in t). On the other hand, as Theorem 11.21 can be viewed as the natural 
generalisation of the construction that gives the correct order of magnitude for i?(3,t), it would be 
interesting to see a significant improvement on the bound in Theorem 11.21 for s > 4. 

We also obtain new lower bounds for cycle-complete Ramsey numbers. Given graphs H\, H2, the 
graph Ramsey number R(H\,H2) is the least number n such that for any 2-colouring of the edges 
of K n there is a monochromatic copy of H\ or H2. Note that R(Ci, Kt) > n if and only if there is a 
C^-free graph on n vertices with no independent set of size t. We prove the following bound. 

Theorem 1.3 For fixed i>A and t — > 00 the cycle- complete Ramsey number satisfies 

R{C e ,K t ) = n(ft/)ogt)&} . 

Again this is quite far from the best known upper bounds (see [101 [25j [33]). For example, Erdos [T3] 
conjectured that R{C i: K t ) = 0(t 2 ~ e ) for some absolute constant e > 0, but this is still open. 

In fact, we establish more general properties of the H-iree process from which these theorems 
follow. In order to show that the process continues to run for a certain number of steps, we will 
establish asymptotic formulae for various graph parameters at any given time in the process, including 
the degree of any vertex, but also more general extension parameters. To state these formulae we 
need some terminology and notation. 

1.2 Terminology and notation I 

We write [n] = {1, • • • ,n} for the vertex set of the process. At step i of the process let E{i) be the 
edges of the graph G(i), let 0(i) be the pairs of vertices that are open (as defined above), and let 
C(i) be the pairs of vertices that are neither edges nor open, which we refer to as closed. 
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We fix some strictly 2-balanced graph H throughout the paper and write 

p = n "h- 1 . 

For any graph T we write Vr for the vertex set of T, Er for the edge set of T, vr = \ Vr \ and er = \Er\- 
For A C Vr we write 

S T =p er n VT and S A ,r = p er ~ er[A] n Vr ^ Al . 

We say that such a pair (A, Y) is strictly balanced if SU,r[B] > Sa,t for every A C 6 C y r and strictly 
dense if S^ns] > 1 f° r every iCBC Vr- 

A key element of our analysis of the H-free process is closely tracking the number of extensions 
from fixed sets of vertices to fixed subgraphs of G(i). Intuitively, the graph G(i) produced by the 
H-free process should be roughly equal to the random graph G(n,i), the graph chosen uniformly at 
random from the collection of graphs with n vertices and i edges, up until the number of copies of 
H in G{n, i) is roughly equal to the number of edges. This occurs when i is roughly pn 2 , with p 
as defined above. We expect the more interesting part of the evolution of the H-iiee process to be 
at and beyond this range of i. Considering G(n,p), which is very similar to G(n,i) here, we note 
that Sr is roughly the expected number of labeled copies of T, and Sa,t is roughly the expected 
number of labeled extensions to T from a fixed set of vertices playing the role of A. Thus we can 
think of these quantities as anticipated scalings by which we should measure the same parameters 
in the H-fiee process. 

In order to track extensions, we track all 'open routes' to such extensions. Suppose T is a graph 
and J is a spanning subgraph of V. Suppose also that A C Vp is an independent set in T and 
(f) : A — > [n] is an injective mapping. We define the extension variables A^j 5 r(i) to be the number of 
injective maps / : Vr — > [n] such that 

(i) /(e) G 0(i) for every e£E T \Ej, 

(ii) /(e) G E(i) for every e <G Ej, and 

(iii) / restricts to <\> on A. 

We say that the random variable X^j^i) is trackable if one of the following two conditions holds: 

(a) (A, r) is strictly dense and T does not contain H as a subgraph, or 

(b) Sa,t = I, (AT) is strictly balanced, Ej C Er, and H is not a subgraph of the graph Y' 
obtained from Y by adding the edges ab for all a,b G A with cf>(a)(f>(b) G E(i). 

It follows easily from the definitions that for any trackable extension variable X^jpii) the pair (A, J) 
is strictly dense. Note further that condition (b) includes the case where Y = H\ab for some ab G Eh, 
e,j < eu — 2, A = {a,b} and (j)(ab) £ E(i). These extensions comprise the set of open routes to a 
copy of H less an edge, where <f>(ab) plays the role of the missing edge. As the appearance of such 
an extension is the mechanism whereby the pair <j>(ab) becomes closed, these particular extension 
variables play a central role in our analysis of the H-fiee process. 
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We fix constants V, W, e, \i throughout the paper which satisfy < \i <C e <C 1/W <C 1/V <C 1/ejf. 
(The notation < a <C /? means that there is an increasing function f{x) so that the following 
argument is valid for < a < f(/3).) We introduce a continuous time variable t, using the scaling 
t = t(i) = i/s with s = pn 2 , and analyse the process up to time i max = //(log n) 1 /^" 1 ), which 
corresponds to 

m = fj,(\ogn) 1 ^ eH ~ 1 ^pn 2 

edges. Let T be the set of all triples [A, J, V) where J is a spanning subgraph of a graph V with 
vr,er <V, A is an independent set in T, and the variables X^j^O) are trackable. Write aut(H) 
for the number of automorphisms of H and define 

q{t) = e -2e H aut{H)-H2tTH-^ P (t) = W (t^' 1 + t) , e(t) = e p « - 1 and s e = n 1 / 2 ^. 

We also define 7(i) to be any smooth increasing function such that j(t) = 40Ve 40V t for < t < 
40V/W, i(t) > 2W for 40V/W < t < 1/(50V), and j(t) < 1/2, j'(t) < W for all t > 0. Then we 
set 0(t) = 1/2 + j(t), so that 1/2 < 9(t) < 1 for all t > 0. 

1.3 Results II: The H-free process 

Our first main theorem gives asymptotic formulae for trackable extension variables throughout the 
process. 

Theorem 1.4 With high probability, for every i < m and trackable extension variable 
corresponding to a triple in T , we have 

X*,J,r(») = (1 ± e(t)/s e )(x A ,j,r(t) ± l/s e )S A ,j, 

where 

x AJ ,r(t) = (2t) e 'q(t)'*-". 

(For this theorem to be useful we choose e < e(V) sufficiently small and then /i < /x(e) sufficiently 
small so that e(t) and q(t)~ v are both at most n e for t < i max -) Note, for example, that there is a 
trackable extension variable describing the number of common neighbours of a set of size d whenever 
p d n > 1, so we have the following corollary. 

Corollary 1.5 With high probability, for every d with p d n > 1, set A of d vertices and i < m, the 
number of common neighbours of A in G(i) is (1 + o(l))(2i/n 2 ) d n. 

A remarkable consequence of Theorem 11.41 is that the graph G(i) for i < m is similar to the uniform 
random graph G(n, i) with respect to small subgraph counts, with the notable exception that there 
are no copies of graphs containing H in G(i). The possibility of this intriguing behavior was first 
suggested by Joel Spencer. The following theorem gives the correct asymptotic counts for labelled 
copies of a graph T in the 'subcritical' case (i) and the 'supercritical' case (ii). For the sake of brevity 
we just establish existence of a copy in the 'critical' case (iii), although our discussion in Section [TUl 
points the way towards better results in this case. 
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Theorem 1.6 Suppose T is an H-free graph and write Xr(i) for the number of labelled copies ofT 
in G(i). Then with high probability 

(i) If there exists B C Vr with Sy\b] < 1 then Xr(m) = 0. 
(H) If S r[B] > 1 for all non-empty B C V T then X r (i) ~ (2i/n 2 ) er n Vr . 
(Hi) If S r[B] > 1 for all B CV r then X r (m) > 0. 

While Theorem 11.41 alone is enough to establish the Turan bounds stated above, our results on 
the Ramsey numbers require an upper bound on the independence number of G(m). Theorem 11.21 
follows easily from 11.81 below. This in turn follows from the following more general result for s > 6. 
(Then we will need to modify the proof slightly to deal with the case s = 5.) 

Theorem 1.7 Suppose that H is strictly 2-balanced and that for any two edges uv, xy of H and 
{x,y} C B C Vh we have S b ^h\uv < 1- Then there is C > such that with high probability the final 
graph of the H-free process has independence number at most Cn^ H ~ 2 )/( ejf ~ 1 )(logri) 1 ~ 1 /' eH ~ 1 ). 

Theorem 1.8 For any s > 5 there is C > such that with high probability the final graph of the 
K s -free process has independence number at most Cn^+i (log n) 1- ^ 2 ) -1 ) 

Alon, Ben-Shimon and Krivelevich |2] recently proposed a construction that takes a nearly regular 
K s -free graph G and produces a regular K s -£ree graph with roughly the same independence number 
as the original graph. It follows from Corollary 11.51 that the graph produced after m steps of the 
K s -hee process is a suitable input for this construction. This suggests that the bound on R(s, t) 
given in Theorem 11.21 can be achieved by a regular graph. (A formal proof would need to provide 
some details missing from the sketch given in [2].) 

We also obtain the following bound when H is a cycle, which implies Theorem 11.31 

Theorem 1.9 For any I > 3 there is C > such that with high probability the final graph of the 
Ci-free process has independence number at most C(nlogn)^ -2 ^^ -1 ) . 

1.4 Organisation of the paper 

In the next section we give a heuristic explanation for the differential equations leading to the 
formulae in Theorem 11.41 In Section 3 we develop some theory of strictly 2-balanced graphs and 
balanced extensions. Over the following three sections we collect various properties that hold with 
high probability on the 'good' event at a given time that the process has followed the trajectory 
of the differential equations so far. Section 4 contains various union bound arguments, Section 5 
gives upper bounds on the extension variables and Section 6 provides a means to approximate the 
number of pairs that become closed when some particular pair is added as an edge. In Section 7 
we formulate our framework for showing that the process follows the differential equations, which is 
based to some extent on that given by Wormald [36], but also incorporates martingale estimates from 
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[7]. Section 8 concerns trackable random variables: we obtain bounds on the one-step changes of 
trackable random variables sufficient to apply the differential equations method. Then we apply the 
differential equation method in Section 9 to prove Theorem [T31 from which Theorem I f . f I immediately 
follows. We also apply Theorem ll.4l to prove Theorem 1 1.61 in Section 10. Next we turn our attention 
to the independence number. In Section 11 we formulate a general property, which we call 'smooth 
independence', and bound the independence number under the assumption that H has this property. 
Then in Section 12 we show that cycles and complete graphs K s , s > 5 have smooth independence, 
from which Theorems II .91 and II .21 follow . We also prove Theorem 1 1.71 in this section. The final section 
contains some concluding remarks. 

1.5 Terminology and notation, II 

We write Qj for the good event that for every < i < j and trackable extension variable Xa jr(i) 
corresponding to a triple in T, we have 

*«,j,r(i) = (1 ± e(t)/8 e )(x A ,j,r(t) ± 0(t)/s e )S AJ . 

Note that this implies the formulae in the statement of Theorem 11.41 since 9(t) < 1 for all t > 0. 

When we count extensions it is convenient to work with labeled graphs, and we will often write 
uv for the ordered pair (u,v) as well as the edge {it, v}. The prime symbol ' is occasionally used to 
denote differentiation with respect to the time variable t: this will be clear from the context. 

Statements containing the symbols ± and/or =F are shorthand for two separate statements: one 
with every ± replaced by + and every =p by — , the other with ± replaced by — and =F by +. We 
also use the notation a = b ± c to mean b — c < a < b + c. Where there is possibility for confusion 
we label the symbols as ±i and ±2, e.g. a ±1±2 = b^ 1 ± c^ 2 is shorthand for 4 separate statements, 
one of which is a ++ = b + ± c~ . 

The parameter n will always be sufficiently large compared to all other parameters, and we use the 
phrase 'with high probability' to refer to an event that has probability 1 — o n (l), i.e. the probability 
tends to 1 as n tends to infinity. In fact we can arrange that our high probability events fail with 
probability at most exp(— n e ). 

We say that a graph W is a join of two graphs W\ and W2 if it has subgraphs J\ isomorphic 
to W\ and J2 isomorphic to W2 such that Vw = K/i U Vj 2 and Eyy = Ej x U Ej 2 . For convenient 
notation we use names for vertices in J\ interchangeably with their corresponding vertices in W\, 
and similarly for J2 and Wi- 

If X is a set and k is a non-negative integer then we write ( , ) for the set of subsets of X of size 

k. 

We will not often refer explicitly to the underlying probability space for the H-free process, but 
we note here the following natural construction. Let 17 = Q n be the set of all maximal sequences 
in with distinct entries and the property that each initial sequence gives an if- free graph on 
vertex set [n]. We stress that our measure is not uniform: it is the measure given by the uniform 
random choice at each step. We always work with the natural filtration Tq Q T\ C . . . given by the 
process. Two elements x, y of O are in the same atom (i.e. part of the generating partition) of J-j 
exactly when the first j entries of x and y agree. 
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2 Trajectory equations 



We start by giving a heuristic explanation of the equations describing the evolution of the fZ-free 
process. We will then prove the validity of these equations in subsequent sections. Recall that G[i) 
denotes the graph on [n] obtained after i steps of the H-iree process: its edge set E{i) contains i 
edges. We partition the non-edges \ E(i) into two sets 0(i) and C(i), which we call open pairs 
and closed pairs, respectively. We say that a pair uv is open if G(i) U uv does not contain a copy of 
H, i.e. uv is a possible choice for the next edge in the process. 

Notation. We consider the following random variables. Suppose T is a graph and J is a 
spanning subgraph of T (i.e. Vj = Vr). Suppose also that A C Vj is an independent set 
(i.e. does not span any edges) in T and (ft : A — * [n] is an injective mapping. Throughout 
this paper we assume that T, J, A, (ft satisfy these conditions, even if this is not explicitly 
stated. We define the extension set H^j^i) to be the set of injective maps / : Vr — > [re] 
such that (i) /(e) € 0(i) for every e € Er\Ej, (ii) /(e) G E(i) for every e G £j, and (iii) 
/ restricts to (ft on A. Then we define the extension variables by X^jpii) = jr(*)|- 
In words, we are counting labeled copies (not necessarily induced) of a graph J in G(i) 
that extend a particular embedding (ft : A — » [re], with the extra condition that some extra 
pairs (i.e. the edges of T \ J) are open. Actually we will be interested in the number of 
copies up to isomorphism, but the equations for labeled copies are easier to work with. 

Examples. One special case of this definition is the number of labeled copies of a graph 
r in G(i): this can be written as Xa, rrW> where we write (fto for the unique function 
(fto : — ► [n] . To count edges and open pairs with this notation we write e and e for the 
two graphs on two vertices, say {a, b}, with one edge and no edges respectively. Then 
X<f) 0! e,e(i) = 2|0(i)| and X^ 0;e>e (i) = 2\E(i)\. We can also express the degree dc(i){v) of 
a vertex v in G(i) as e e (i), where again e is the edge ah and we write <ft v for the 
function (ft : {a} — > [n] defined by 4>{a) = v. 

We write Q(i) = 2\0(i)\ for the number of ordered pairs that are open. For an ordered pair 
uv £ 0(i), write C uv (i) for the set of ordered pairs xy £ O(i) that would become closed, i.e. belong 
to C(i + 1), if at time i + 1 the process chooses uv as the edge ej + i. By the definition of C(i + 1) 
this means that adding uv and xy to G(i) would create a copy of H. Another way to say this is 
that there is a subgraph J obtained by deleting two edges ab and cd from H and an injective map 
/ : Vh — > [re] such that /(a) = u, f(b) = v, /(c) = x, f(d) = y and /(e) G -E(i) for every edge of 
J. We have / € H^ Tj j Tj r T (i), where given such a quadruple T = (a,b,c,d), we write Tt = H \ ab, 
Jt = H \ {ab,cd} and define (ftx by (ftrio) = ^ and ^t(^) = v. In principle there could be many 
embeddings / giving the same pair xy, but we will show in Lemma 16. II that this is very unlikely: for 
most xy € C uv (i) there will be exactly one such embedding /, up to an automorphism of H. We will 
see that C^y (i) ~ aut(H) 1 ^2 T X,p T ^ T £ T {i), where the sum is over quadruples T = (a,b,c,d) such 
that ab and cd are distinct (but not necessarily disjoint) edges of H. 

To approximate the extension variables we introduce a continuous time variable t, using the 
scaling t = t(i) = i/s with s = pn 2 , where we recall that p = n~( VH ~~ 2 ^( eH ~ 1 \ We noted above that 
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this is the point at which the number of copies of H in the random graph G(n, s) is comparable to 
the number of edges s, so it is natural to anticipate the interesting behaviour to occur at this scale. 
We analyse the process up to time i max = ^(log n) l '( eH ~ l \ for some small constant \x > 0, which 
corresponds to m = /^(log n) 1 ^ eH ~ 1 ^pn 2 edges. For the variable X^jrii) with <j) : A — > [n] we use 
the scaling Sa,j = p ej n Vj ~\ A \ . Again, we noted above that the count of these extensions in G(n,s) 
suggests the use of this scaling. Our eventual aim is to prove that with high probability, for every 
i < m and for every trackable extension variable Xfajr(i) corresponding to a triple in T, we have 
the asymptotic formula 

*0,j,r(i) = (1 ± e{t)/s e ){x A ,j,r(t) ± 0(t)/s e )S A ,j, 
where XA,j,r{t) = (2t) ej q(t) er ~ ej and q(t), e(t), 6(t), s e are as defined above. 

Note that x^ 0i e je (t) = q(t), so the good event pertaining to Q(i) is Q(i) = (1 ± e(t) / s e )(q(t) ± 
9(t)/s e )n 2 . We also write c(t) = aui(ff) -1 Y^t x <Pt,Jt^t^)^ where as above the sum is over quadru- 
ples T = (o, b, c, d) such that ab and cd are distinct edges of H. 

Now we give an informal derivation of the differential equations satisfied by the functions xa Jr(f)i 
which describe the main terms for the behaviour of the variables X^ jp. We stress that this discussion 
does not constitute a proof of Theorem ll.4t rather, it motivates the functions x A ,j,r(t) defined 
above, and presages the proper proof given below, in which the calculations we make here will 
play a central role. For the sake of the discussion we ignore the error terms described by e{t) 
and s e , and use the approximations X^j^fi) ~ %A,J,r{t)SA,j, so Q(i) as q(t)n 2 and C uv (i) « 
c(t)p eH ~~ 2 n VH ~ 2 = c(t)p~ l . The system of differential equations will follow from the approximation 
%A,J,r{t + s^ 1 ) w XA,j,r(t) + s -1 a4 jr (i) and replacing changes X^j^ij + 1) - X^j^ij) by their 
expected value given Qi. Intuitively, although the change in a single step may be far from its 
expected value, over many steps a 'law of large numbers' will apply to the accumulated changes. 
We also ignore two 'pathological' behaviours that will need to be dealt with in Section As an 
illustrative case, we start by counting open edges \0(i)\ = Q(i)/2. When we choose the edge ej+i we 
have 

Q(i + 1) = Q{i) - 1 - C ei+1 (i) « q(t)n 2 - c^p' 1 . 

Since 

Q(i + 1) « q{t + l/s)n 2 (q(t) + a"V(t))n 2 = ?(t)n 2 +p _ V(*) 
we have the equation q'(t) = — c(t). 

To derive the differential equation for the general extension variable XAJvifyi we write 

*<M,r(* + 1) - X Mr (i) = Y+ J>T (i) - Y- JT (i), 

where Y^ JT {i) > is the number of functions / : Vr — ► [n] in H0 j j i r(i+l)\H^ )) j j r(z), and Y7 jr (i) > 
is the number of functions / : Vr —> [n] in H^rfi^H^jj^i+l). The term Y^ JV {i) has contributions 
corresponding to each edge e of J. A function / in H^ j\ e r (z) will be counted by Y^ JV {i) if the 
process chooses the edge e»_|_i equal to /(e). Since ej+i is chosen uniformly at random among Q(i)/2 
open edges, we can estimate 
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The term Y7j V {i) has contributions corresponding to each edge e of T\ J. A function / in H<^j 5 r(i) 
will be counted by ^~j r (^) if t ne process either chooses the edge ej + i equal to /(e) or /(e) becomes 
closed, i.e. /(e) € C(i + T). Thinking of e^+i as an ordered pair, the number of choices is 2 + Cy( e )(i), 
each occurring with probability Q(i) . Therefore 

E 0W«l&) = -qTK2^ 2^ ( 2 + C7(e)M) ~ (er - ej) -— 2 . 

^ v ; eer\j/eH^j, r (i) yv 7 

On the other hand, we have 

^t/.rW ~ Y i,j,r(i) = x <P,J,rd + 1) - X Mr {i) « (x A j,r(i + s" 1 ) - x A ,j,r(t))^,j 

so we have the equation 

?(*)^,j,r(*) = 2 J] ^,JV>r(*) - ( e r ~ ej)c(t)x A ^ r (t). (1) 

Note that the equation q'(t) = c(t) derived above is simply a special case of (H|). 

To solve these equations we use the substitution XA,j,r{t) = q(t) er ~ ej zi(t), where we will see that 
the functions ze(t) can be parameterised by a single number I = ej. Then, since q'(t) = —c(t), we 
have q(t)x' A JT (i) = q(t) er ~ ej+l z' l (t) — c(t)(er — ej)q(t) er ~ ej Zg(t), which also equals 

2j> >M \e,r(t) - (e r - ej)c(t)x A) j^(t) = 2£q{tf^ +1 z l _ l {t) - (e r - ej)c(t)q(t)^- e ' z E (t). 

egj 

We deduce that z' £ (t) = 2£z£-i(t). Now we use the initial conditions that xa,j,f(0) is equal to 1 
if ej = 0, otherwise (e.g. q(0) = 1). So z (0) = 1 and z e (0) = for £ > 0. We obtain the 
solution Z(_{t) = (2i) . Also q'(t) = —c(t) = — au^H)" 1 Y^t x ^>t,Jt,t t (^) = ~ awi(if) _1 4e#(eH — 
l)q{t)(2t) eH ~ 2 . Integrating and substituting we conclude that 

g n\ _ e -2e H aut(H)~ 1 (2trH-^ 

x A>J>T (t) = (2t)^ e - 2(er - ej)effaui(H) ~ 1(2 ' )eH_1 = (2t) ej q{t) er - £j . 



Remark. As discussed above, we expect these random variables to evolve as they do in 
the unconstrained random graph G(n,i). Thus it is natural to compare the process G(t) 
at time t to the random graph G(n,p), where pn 2 /2 = i = tpn 2 , i.e. p = 2tp. In G(n,p) 
we can define open/closed pairs and the variables X^j^{i). For any ordered pair uv in 
[n], edge ab of H and function / : Vh — > [n] with /(a) = u, fib) = v the edges of f(H\ab) 
will all be present in G(n,p) with probability p eH ~ 1 . (For the purpose of this discussion 
we ignore the negligible contributions from functions / that are not injective.) Given uv, 
there are 2eHn VH ~ 2 such functions / : Vh — * [n], corresponding to 2eHaut(H)~ 1 n VH ~ 2 
distinct sets of edges. The probability that uv is open should be approximately 

(1 - p zx-l^e H aut{H)-^H-* _ ^ (_(2 tp yH-l 2eHaut (H)- l n VH - 2 ) = q(t). 
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Similar reasoning applies to general extension variables, and the equations we derived 
above agree with the corresponding equations for G(n,p). (See Spencer [32] for results 
on extension variables in this model.) We could use this correspondence as the starting 
point of our discussion and as a heuristic for the trajectories our variables follow, but 
this would not provide any insight into how to prove that our random variables actually 
follow the given trajectories. As we noted above, the calculations in this section play a 
central role in the proof of Theorem 11.41 



3 Strictly balanced graphs and balanced extensions 

In this section we obtain some basic properties of our fixed strictly 2-balanced graph H. We also 
introduce a more general concept of strictly balanced extensions, and discuss the manner by which 
arbitrary extensions can be decomposed into a series of such extensions. First we recall the relevant 
definitions. We suppose that H is strictly 2-balanced, in the sense that vh, e# > 3 and ^ vk-2 
for all proper subgraphs K of H with vh > 3. We also fix the parameter 

_v M -2 

p = n e h- x . 

For any graph T we define the scaling of T to be Sr = n Vr p er . The condition that H is strictly 
2-balanced can be also be written as Sk > Sh for all subgraphs K of H with 2 < vk < vh, since 
Sh = n VH p eH = pn 2 and 

S k /Sh = n VK ~ 2 p eK - x = n (eK_1) (^"^) > 1. 

Note that the scaling Sr is always an integer power of n 1 /^ -1 ). It follows that the inequality Sr > 1 
actually implies Sr > n 1 '^ -1 ' and similarly that Sr < 1 implies St < n~ 1 '^ eH ~ 

The following lemma collects some simple properties of H and p. 
Lemma 3.1 

(i) If d is the largest integer for which np^ 1 > 1 then H has minimum degree at least d. 
(ii) We have p > 1/n, and so H has minimum degree at least 2. 
(Hi) H is a 2-connected graph, and if {x,y} is a cutset then xy ^ Eh- 

Proof. First note that H cannot have a vertex v of degree at most d — 1: otherwise Sh/Sh\ v = 
np d<yV ^ > 1, which contradicts the fact that H is strictly 2-balanced. We deduce that H has minimum 
degree at least 1. Next, suppose for a contradiction that p < 1/n. Then en < vjj — 1- However, for 
every connected subgraph K of H we have ex > vk — 1, so y*~2 — 1 — vh-2 ' w ^ c ^- contradicts 
the definition of H being strictly 2-balanced. Therefore p > 1/n. Now suppose for a contradiction 
that H is not 2-connected. Then we can write Vh = X U Y so that Eh = Eh[x] U Eh\y] anci 
\X n Y\ = 1. Then S H [x]Sh[y] = n^H, so without loss of generality we have S H \x] ^ (nSn) 1 ^ 2 , and 
since Sh = pn 2 we have Sh{x]/Sh < ( n / ' Sh) 1 ^ 2 = (1/pn) 1 / 2 < 1. This contradicts H being strictly 
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2-balanced, so H is 2-connected. Finally, suppose that {x,y} is a cutset, but that xy € Eh- Write 
V H = X U Y so that E H = E H[X] U £7 H[y ] and X n Y = {x, y}. Then S h[X ]S H [y} = pn 2 S H = (pn 2 ) 2 , 
so without loss of generality S H ^ X ] < pn 2 = Sh- But this contradicts H being strictly 2-balanced, 
so xy ^ □ 

Recall that if V is a graph and A C Vp we define the scaling of the pair (A, T) to be 

S Ar = p e r-e r[ A] n vr-\A\^ 

Note that SU,r = Sr/S-p[A\- Also, for any A C C Vr we have S^r = Sr/5r[£] = Sr/<Sr[A] ■ 
•St^/St^b] = Sa,t/Sa,f[b]- We say that (A, T) is strictly balanced if for any A C B C y r we have 
<SU,r < Sa,t\b]j or equivalently <SB,r < 1- For example, we can again rephrase our assumption that H 
is strictly 2-balanced to say that for any edge e = ab of H, with A = {a, b} the pair (A, H) is strictly 
balanced. Indeed, S a ,h = p e H~ l n" H ~ 2 = 1, and for A C C V# we have Sb,b = S b /Sh[b] < 1- 

We will apply results on strictly balanced extensions to arbitrary pairs (A, T) using the extension 
series A = Bq C Si C • • • C = Vr of (A, T), which we construct by the following rule. If (Bj,r) 
is not strictly balanced then Bi + i is chosen to be a minimal set C with Bi C C C Vr that minimises 
-S'si.riC] = nl c 'l _ l' B4 lp er [ c ') _6r ' s i), otherwise we choose i?^ = = Vr- For more compact notation 

we also write Sf(T) = S Bii r[B i+1 ]- We note the following properties of extension series. 

• (Bi,T[Bi + i\) is strictly balanced. 

• For i > 1 we have Sf(T) = S Bt>r[Bi+l] = S Bt _ uV[Bt+l] / S Bi _ 1>r[Bi] > 1. Therefore the sequence 
SA,r[Bi] = n}=o^/(^) * s non-decreasing. However, it is not necessarily true that the sequence 
of successive factors S^(F) is non-decreasing. For example, consider the K?-free process, where 
p = n' 1 / 4 , and let T = K 4 . Choosing A of size 2 we have T[B ] = K 2 , F[Bi] = K 3 , T[B 2 ] = K 4 
with S^(T) = np 2 = n 1 ' 2 and Sf(T) = np 3 = n 1 / 4 . 

• It is possible that Sa,t < 1 but some factors Sf(T) are greater than 1. For example, consider 
the C5-free process, where p = n -3 / 4 , and let T be the graph consisting of K4 plus an isolated 
vertex. Choosing A to be 2 vertices of the K4 we have T[Bo] = K 2 , r[5i] = K4, T[B 2 ] = T, so 
S*(T) = n 2 p 5 = n~ J /\ Sf(T) = n and S A ,r = n~ 3 / 4 . 

4 Union bounds 

In this section we collect some useful properties of the H-fiee process, assuming that the good events 
Qi hold. Recall that on & we have Q(i) = (l±e(i) / s e )(q(t)±0(t) / s e )n 2 , and q(t) = exp (-e(t e ^ _1 )) , 
where the constant in the G-notation depends only on H. We analyse the process up to time 
tmax = m / s = ^(logn) 1 /*^ -1 ), and choose \x > sufficiently small so that e(t),q(t)~ v < n £ . Since 
s e = n x l 2e H-t we k ave > n 2-e ( sa y) f or { < m . The following lemmas use this lower bound for 
Q(i) and union bound estimates. We will state the bounds at time m, but they also hold at any 
time i < m by monotonicity. Our first lemma bounds the probability that G(m) contains some fixed 
graph F. 
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Lemma 4.1 For any fixed graph F on [n], the probability that Q m holds and G{m) contains F is at 
most p eF n 2eF<L . 

Proof. We take a union bound over all choices of steps 1 < i±, ■ ■ ■ , i ep < m where the jth edge of 
F is chosen as the edge added to form G(ij) from G(ij — 1). Since edges are chosen uniformly 
at random from at least n 2 ~ e options, each choice has probability at most n~( 2_<E ) conditional on 
the history of the process. Therefore ¥(F C G(m)) < m eF n~~( 2 ~ e ' eF ^ ^e Fn 2e F ^ ga ^ s j nce m = 
IJL(\ogn) l ^ eH ^pn 2 . □ 

Given sets A, B C [n], write e(A, B) for the number of edges in G{m) that have one endpoint in 
A and the other in B. Our next lemma gives a bound for e(A, B) holding with high probability for 
all choices of A, B of specified size. 

Lemma 4.2 For any a,b > 1, the probability p a ^ that Q m holds and there exist sets A,B C [n] such 
that \A\ = a, \B\ = b and e(A,B) > max{4e _1 (a + b),pabn 2t } satisfies p a ^ < n~( a+b \ 

Proof. Write x = max{4e _1 (a + b),pabn 2e }. We take a union bound over ("J choices for A, 
choices for B, at most ways to choose x pairs with one endpoint in A and the other in B, and 
less than m x choices of steps l<ii<---<i ;r <min which to choose these pairs as edges of the 
process. Since edges are chosen uniformly at random from at least n 2 ~ e options, each choice has 
probability at most n~( 2-e ) conditional on the history of the process. Therefore we can estimate the 
probability by p a ^ < (™) (^) ( a ^m x n~^ 2 ~ € ^ x . Since m = p,i\ogn) 1 ^ eH ~ 1 ^pn 2 , we have 

logp a ,fe < a(log(n/a) + 1) + 6(log(n/6) + 1) 

+ x(log(a6/x) + 1 + log(pn e ) + log/i + {en - log log n) 
< (a + b — ex/2) log n, 

since x > pabn 2e and n large imply that — (log(a6/x) + log(pn e )) > elogra S> log log n. Since 
x > 4e _1 (a + b) the stated bound follows. □ 

For A C [n] let Da^ be the set of vertices v such that \NG( m )(v) Pi A\ > d, i.e. in G(m), v has at 
least d neighbours in A. We conclude this section by applying the previous lemma to give an upper 
bound for D^- 

Lemma 4.3 For any 8e _1 < d < a < dp~ l n~ 2e , the probability that Q m holds and there exists 
A C [n] with \A\ = a and \D^ tC i\ > 8e~ 1 d~ 1 a is at most n~ a . 

Proof. Set B = DA,d, b = \B\ and consider the event that b > 8e~ 1 d~ 1 a. Since e(A,B) > db and 
d > 8e _1 we have e(A,B) — 4e _1 6 > db/2 > 4e~ 1 a. Also, the bound a < dp~ 1 n~ 2<L implies that 
e(A, B) > db > pabn 2e . By Lemma 14.21 this event has probability at most n~( a+b ^ < n~ a . □ 

5 Counting extensions 

In this section we see how to obtain general upper bounds on extension variables, assuming that the 
good events Qi hold. We will state the bounds at time m, but they also hold at any time i < m by 
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monotonicity. Let N$ j = Xj, j t j(m): the number of extensions of a fixed embedding : A — ► [n] 
to an embedding / : J — ► G(m), where A C V} is independent. Note that this is an upper bound 
for X^^pim). The following lemma gives a good estimate on N^j when the extension is strictly 
balanced. 

Lemma 5.1 Suppose (A, J) is strictly balanced and (j) : A — > [n] is an infective map. Letu>(n) be any 
function such that u)(n) —> oo as n — > oo. On Q m , with high probability we have Na, j < SA,jn iej<L if 
Sa,j > 1 and N^j < u(n) if S A ,j < 1- 

Proof. We start by estimating the maximum number of vertex-disjoint extensions of <j> to an em- 
bedding of J. Let N'j j be the maximum number s such that there are embeddings /i, • • ■ , f s of J 
in G(m), all restricting to <f> on A, with fi(Vj \ A) and fj{Vj \ A) disjoint for all 1 < i < j < s. We 
can estimate f(N'^ j > s) by a union bound over at most s!~ 1 (n 1 ' J ~l yl l) s possible functions /i, • • ■ , f s , 
where for each choice of functions, we can apply Lemma 14. II to obtain an upper bound p se Jn 2seje on 
the probability that the graph F = Uf =1 /j(J) is a subgraph of G(m). Therefore 

P(A r ;,j > s) < sr l (n Vj -\ A \) s p sej n 2seje < {Zs- l S A ,jn 2eje ) s . 

If Sa,j > 1 then we can set s = SA,jn 3eje to get a bound holding with failure probability much less 

than exp (— n e ). On the other hand, if S A j = p ej n Vj I < 1 then, since p = n £ h~ 1 , we in fact 
have j < n _1 /( eH_1 \ Assuming that e < (2eje#) _1 we then have SA,jn 2ejt < 1, and we can set 
s = Lo'{n) for any function to'(n) — > oo as n — > oo to get a bound holding with failure probability 
much less than n~ c for any constant C > 0. 

Now we argue by induction on vj — \A\ to show the following bounds on iV^j: if Sa,j > 1 
then JV^j < SA,jn 3eje u'(n) 2( - Vj -^ and if S^J < 1 then N^j < u/(rt) 2(vj ~ |A|) . Then we can 
choose uj'(n) 2 ( Vj ~\ A ^ < ui(n) < n e to obtain the bounds required for the theorem. Our base case is 
vj — \A\ = 1, when we have N^j = JV£ j, and we can apply the bounds just shown for NL j. 

Next suppose vj — \A\ > 1. We claim that for any embedding / counted by iV^ j there are at most 
j^2{vj-\A\)-i em beddings /' counted by N^j with f'(Vj\A)n f(Vj\A) ^ 0. To see this, consider 
any such /' and let B = {b G Vj : f'(b) G f(Vj)}, so that A C B C V>. Let 0' be the restriction of 
/' to 1? and let J' = J \ be the graph obtained from J by deleting all edges inside B. Then, 

as noted above, Sb,j> = S 'a,j / 'S 'a,j\b], an d since (A, J) is strictly balanced we have Sb,j' < 1- By 
induction hypothesis we have Np } j < u' ' (n) 2<yVj ~\ B \\ Also, there are at most t)^ < v"j choices 
for (p' , so at most v v f uj' {n) 2i < Vj ~\ B ^ embeddings /' corresponding to this set B. Summing over all 
A C B C Vj we obtain at most a;' (ri ) 2 ^- 7 ~ I ^ I ) — 1 (say) such embeddings /'. 

Finally, we can estimate iV^j by means of a maximum collection F = {/i, • • • , f s } of vertex- 
disjoint extensions of 4> ( so \F\ = N'^j). Any extension / counted by N^j has a common image 
with some fi £ F outside of A, and for each fi £ F we have at most lj' (n) 2 ^ Vj ~' A ^~ i such em- 
beddings /. Therefore N^j < N'^ju'in) 2 ^-^- 1 . If S A ,j > 1 then N'^j < S A ,jn 3e > e and 
so N^j < S A ,jn 3eje uj'{n) 2i ~ Vj -\ A \\ On the other hand, if S A ,j < 1 then N'.j < u/(n) and so 
< oj'{n) 2 ( Vj ~\ A \\ This completes the proof. □ 

For general extensions N^ j may be considerably larger than Sa,j, but the following lemma gives 
a useful bound. 
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Lemma 5.2 On Q m , with high probability we have N^j < n je maxACBcVj Sb,j- 

Proof. Consider the extension series A = Bq C B\ C • • • C B& = Vj. We repeatedly apply 
Lemma 15.11 to bound the number of extensions in each step of the series. At the first step we 
either have S^(J) < 1 and so A^j^] < w(n) or S^{J) > 1 and so N^ J[Bl] < S^(J)n 4e W. At 
subsequent steps i > 1 we have Sf(J) > 1, so for each injection (j)' : Bi — > [n] we have jab < 
S^(J)n 4 ^ ej[B< + l] 6j[s i^ e . Multiplying these bounds and using Sa,j = Tito 1 Sf(J) gi yes a bound 
equal to either n iejt S A ,j when S^(J) > 1 or uj(n)n^ ej - ej W )e S Bl ,j when S^(J) < 1. By definition 
of the extension series, max^cscv> Sb,j is either Sa,j when Sq(J) > 1 or Sb^j when Sq(J) < 1. 
Also, we may assume that ejj Bl ] > 1 (otherwise £7j is empty), so we can choose uj{n) < n e to obtain 
the required bound. □ 

Remark. In both of the preceding lemmas we can choose u(n) = n ce for some constant 
c > to make the failure probability exponentially small. 

We say that the pair (A, J) is dense if Sq(J) = S^ j^j > 1 and strictly dense if Sq(J) > 1 (and 
so Sq(J) > n 1 /^ 11-1 )). Since Sf(J) > 1 for i > 1, for a dense pair we have max^cflcv> Sb,j = Sa,j, 
so the previous lemma gives an approximate upper bound of Sa,j for Naj. Note that if (A, J) is 
strictly dense then so is (A, J') for any subgraph J' of J, since we have S^mm > S^nw > 1 for 
any B with A C- B CVj. The same argument shows that if J is a subgraph of H with ej < en — 2 
and ^4 = {u, v}, where uv G Sjj \ Sj, then (A, J) is strictly dense. 

We conclude this section by showing that adding an edge to a strictly dense pair gives a significant 
improvement on the bound for N^ j. 

Lemma 5.3 Suppose that (A, J) is a strictly dense pair, a, b are vertices of J with ab £ Ej and 
{a, b} % A, and J' = JL){ab} is obtained by adding the edge ab to J. Then max^cBcVw Sb,j' < Sa,j, 
and so on Q m , with high probability we have N& j> < n~^'( eH ~ l '^ ej ' e SAj- 

Proof. Choose B with A C B C Vj maximising Sb,j'- If B = A we have Sb j' = pSa,j, whereas if 
B/iwe have Sb, j 1 < Sb,j = Sa,J / Sa,J\b] < $a,j, as (A, J) is strictly dense. Either way we have 
Sb j' < n ~ 1 ^ eR ~ 1 ^ Sa,j, since it is an integer power of n 1 /^" 1 ), so the bound on N^ji follows from 
Lemma 15.21 □ 



6 Closure fidelity 

Recall that for an ordered pair uv € 0(i), we write C uv (i) for the set of ordered pairs xy £ 0{i) 
that would become closed, i.e. belong to C(i + 1), if at time i + 1 the process chooses uv as the edge 
e.j + i. By definition of C(i + 1) this means that adding uv and xy to G(i) would create a copy of H. 
Also, since uv and xy are open, any such copy of H must use both uv and xy. In principle there 
could be many such copies of H , but we will show in this section that in fact this is not the case, 
and moreover, by counting these copies of H we obtain an accurate estimate for the number of pairs 
closed by uv. 
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We frequently need to estimate the number of overlapping extensions of two pairs (Ai, J\) and 
(A2, J2), so we will introduce some notation for this situation. Recall that a graph TV is a join of 
two graphs W\ and W2 if it has subgraphs J\ isomorphic to W\ and J2 isomorphic to W2 such that 
Vw = Vj 1 U Vj 2 and Eyy = Ej 1 U Ej 2 . For convenient notation we use names for vertices in J\ 
interchangeably with their corresponding vertices in W\ , and similarly for J2 and W2 ■ Whenever we 
use this notation the sets A\ and A2 will be independent and we will write C = Vj x PI Vj 2 . 

We need some further notation for describing the possibilities by which a pair uv can close a pair 
xy. There must be a subgraph J obtained by deleting two edges ab and cd from H and an injective 
map / : Vh — ► [n] such that f(a) = u, f(b) = v, /(c) = x, f(d) = y and /(e) € E(i) for every edge 
of J. The map / is counted by X ( f >T ^j T ^ T {i), where given such a quadruple T = (a, b, c, d), we write 
IV = H \ ab, Jt = H \ {ab, cd} and define 4>T by <j>r(a) = u and <^t(&) = w - 

For the sake of an argument needed in the proof of Lemma [11. II we extend the definition of C uv (i) 
to allow the case when uv € C{i) is a closed pair: we define it as the number of pairs xy such that 
adding uv and xy to G(i) creates a copy of H containing both uv and xy. 

Lemma 6.1 With high probability, for every 1 < i < m and ordered pair uv £ 0(i) UC(i), assuming 
Qi, we have \C uv (i)\ = aut(H)~ l ^2 lT X ( f >T) j T p T {i) ± n^ 1 / eH p^ 1 , where the sum is over quadruples 
T = (a,b,c,d) such that ab and cd are distinct (but not necessarily disjoint) edges of H. 

Proof. Let P be the set of ordered pairs xy for which there exist (at least) two embeddings /i,/2 
of H in G(i) U {uv,xy} with fi(Eff) 7^ fiiEn) such that both embedded copies fi(H) and /2(H) 
use the edges uv and xy. Given any xy G P we fix any two such embeddings f\ and f2- Let W he & 
graph isomorphic to (fi(H)U /2(H)) \ {uv, xy} and write a, b, c, d for the vertices in W corresponding 
to u, v, x, y respectively. Note that these are not necessarily distinct, but there are at least 3 distinct 
vertices in the list, since {u,v} ^ {x,y}. Let <p be the function defined by 4>{a) = u and 4>{b) = v. 
We bound P by estimating, for all such W, the number N^yy of embeddings of W in G(i) where a 
is mapped to u and b to v. 

There are two cases, according to whether or not we have fi(Vn) = /2(Vff)- If fiiVn) = /2(Vff) 
then, since fi(Ejj) 7^ f2(En), W is obtained from a subgraph J = H \ {ab, cd} of H by adding at 
least one edge. As noted above, (ab, J) is strictly dense, and so by Lemma 15.31 we have N^w < 
n -i/(eH-l)+4e w ep-i_ j^ ow SU pp 0se fo&t /i(VhO / ^(Vff)- We need to estimate N^w where W is 
the join of J\ = fi(H) \ {uv,xy} and J2 = /2(H) \ {uv,xy}. With the above notation we have 
A\ = A2 = {a, b}, and C = Vj x n Vj 2 contains {a, b} and {c, d}, so C \ A\ and C \ A2 are non-empty. 
Choose B with A% U A2 C i? C W maximising Ss, w an d write By = B C\ Vj x , B2 = B f] Vj 2 . 
We consider three subcases according to B\ and i?2. The first subcase is B\ U C 7^ Vj x . Then we 
have 5b 1 uc,Ji = Sb±uc,h < 1, as {c, d} C C and if is strictly 2-balanced. Also Sb 2 ,j 2 < Sa 2 ,J 2 i 
since (^2^2) is (strictly) dense, so Sb.w — ^b 2 ,J2^b 1 uc,j 1 < Sa 2 ,J 2 = P • The second subcase 
is B2 U C 7^ Vj 2 , when a similar argument gives = Sb 1 ,j 1 Sb 2 uc,j 2 < Sai,Ji = Finally, 

the third subcase is Si U C = V> a and S 2 U C = V> 2 . Then Vj x \ (A x U C) and Vj 2 \ (A 2 U C) are 
non-empty, since /i(Vff) 7^ /2(Vff). Since (^4i, Ji) is strictly dense we have Sb 1 ,j 1 < Sa 1 ,j 1 = P~ l , so 
Sb,w = Sb 1 ,j 1 Sb 2 uc\j 2 = Sb!,Ji < V' 1 ■ In all cases we have S B ,w < P^ 1 , so S B ,w < n~ 1 ^ eH ~ 1 'p~ 1 , 
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since it is an integer power of n 1 '( eH 1 >. Now Lemma 15.21 gives Nsw < n 1 ^ £H l ) +Ae w € p 1 . 
Summing over less than |Vh| 2 '^' (say) choices of W we obtain a bound \P\ < n~ 1 ^ £H ^ 1 ^ 2 ^p~ 1 , say. 

To finish the proof we calculate the number of ordered pairs xy ^ P counted by C uv (i) . For each 
such pair xy there is a unique copy H c of H in G(i) U {uv , xy}. For each quadruple T = (a, b, c, d) 
in H such that there is an isomorphism / : H — > H c with /(a) = u, f(b) = v, /(c) = x, f(d) = y 
we count xy by X^, Ti j Ti r T (i). Also, any other such quadruple T" = (a' ,b' ,c' , d') and isomorphism 
f : H —* H c with /'(a') = u, /'(&') = u, f'(c') = x, f'(d') = y corresponds to the automorphism 
of H, and this is a one-to-one correspondence. Therefore we can estimate the number of 
ordered pairs xy £ P that close uv by aut(H)~ 1 YIiT^^t^t^t^S) ^ l-^D- Including the pairs in P, 
we can estimate |C u .y(i)| by aut(H)~ 1 YIt X(j> T ,j T ,r T (i) ^ n~ l l eH p~ 1 , say. This completes the proof. 
□ 

Note that the extension variables which appear in Lemma 16.11 are trackable: they satisfy con- 
dition (b) in the definition, since uv £ E(i). Substituting the formulae X ( f >Tj j T: Y T {i) = (1 ± 
e{t)/s e ){(2t) eH ' 2 q(t) ± 6(t)/s e )p' 1 and recalling that s e = n x l 2eH ~ e < n l / eH we obtain the fol- 
lowing estimate. 

Corollary 6.2 With high probability, for every 1 < i < m and ordered pair uv € 0{i) U C(i), 
assuming Qi, we have 

\C uv (i)\ = (1 ± 2e(t)/s e )(a H {2t) e ^ 2 q{t) ± e{t)/s e )p-\ 
where an = 4e#(e# — 1) / 'aut(H). 

7 Martingale estimates: the differential equations method 

Our main tool for establishing concentration of random variables will be the following versions of 
the Azuma-Hoeffding inequality, Lemmas 6 and 7 from [7j. First we need some definitions. Suppose 
we have a sequence of random variables Xq , X\ , • • • and a filtration Tq C T\ C • • • (which will 
always be the natural filtration given by the process). We say that the sequence Xq, X\, ■ ■ ■ is a 
martingale if E(Xj+i | J-j) = Aj for i > 0. We say it is a submartingale if E(Aj + i|^ r j) > X{ for i > 
or a supermartingale if E(Aj + i|.Fj) < Aj for i > 0. We say that a sequence of random variables 
Xq, X\, • • • is (77, N)-bounded, for some 77, A > 0, if Aj — 77 < Aj + i < Aj + A for all % > 0. In our 
application below we consider sequences of random variables Ao, Ai , . . . where the difference sequence 
Di = Ai+i — Ai satisfies < Di < A and EDj = (1 ± ei)d{ for some < rj/2 and a small error term 
< e; < 1. We will define Af = E J<l (^i " (1 - efidj), and A" = E^iC^' - (1 + ejK). Then 
each of Af is (77, A)-bounded, is a submartingale and A~ is a supermartingale. We refer to Af 
as a martingale pair with parameters (77, A). 

Lemma 7.1 Suppose 77 < A/10, m > 1, a > and Aq, Ai,--- is an (77, N)-bounded submartingale. 
Then P(A m < A - a) < e - a2 /^ mN . 

Lemma 7.2 Suppose rj < A/10, m > 1, < a < nm/10 and Aq,A\,--- is an (77, A) -bounded 
supermartingale. Then P(A m > Aq + a) < e - a2 / 3r imN _ 
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We now come to the formulation of the differential equations method. Although it is technically 
involved, the idea behind it is quite simple. We have a collection of sequences of random variables, 
and would like to prove that certain asymptotic approximations hold with high probability at each 
step of each sequence. The asymptotic formulae are heuristically derived by considering the one- 
step expected changes in these variables. We let Q% be the event all formulae hold up to step %. If, 
conditional on the expected change of a random variable from step i to step i + 1 is close to what 
it should be for these formulae to hold, and we also have a useful absolute bound for these one-step 
changes, then we can apply martingale estimates to show that the event Qi indeed holds with high 
probability. We recommend the survey of Wormald [36] for an introduction to this method, and a 
comparison of Lemma 17.31 below with Theorem 5.1 in Wormald [36] may be helpful. We also note 
that Seierstad [29 , 30J has recently given improved large deviation bounds and a central limit theorem 
for the method under certain general criteria. One difference in our theorem is that we phrase our 
result in terms of a known smooth solution to a system of differential equations, and thus side-step 
the issue of the existence of a solution. However, the important difference is in the hypothesis for 
the bounds on the one-step changes of the variables: by using Lemmas 17.11 and 17.21 we can make do 
with much weaker estimates than those needed to apply the general result from [36J. 

Set-up for Lemma 17.31 Suppose we have a stochastic graph process defined on the 
vertex set [n], where n is large. Let r be a fixed positive integer, and for each j E [r] 
let kj,Sj be parameters (which can depend on n). Suppose that for each j E [r] and 
A E (-T.) there is a sequence of random variables X^j^i), defined for i = 0, . . . ,m and 
measurable with respect to the underlying graph process. We suppose further that 

X hA {i + 1) - X jtA {%) = Y+ A (i) - Y- A (i), 

where Y^ A {i) ,Y~ A {i) > 0. We relate these sequences of random variables to functions on 
[0, oo) by introducing t = i/s for some function s = s(n) that goes to infinity. We hope 
to find a collection Xj(t) of continuous functions such that 

Xj,A(i) ~ Xj(t)Sj 

for all j E E (^) and i = 0, . . . ,m. Note that in our application i will be the 

number of edges that have been added, and we can think of s as the time-scaling for the 
underlying process. We can think of 1 < j < r as the 'type' of a random variable and 
the set A as giving its 'position' in the graph. The parameter Sj is the size-scaling for 
the j-th type of random variable. 

Now we will formally state our lemma. Note that for technical reasons we also allow the intro- 
duction of an additional sequence 7ii of high probability events. 

Lemma 7.3 Let < e < 1 and c, C > be constants, and suppose that for each j E [r] we have a 
parameter Sj = Sj(n), and functions Xj(t), Cj(t), 9 jit), "fj(t) that are smooth and non-negative for 
t > 0. For i* = 1, . . . , m let Qi* be the event that 
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for all 1 < i < i* , 1 < j < r and A € Suppose that also there is a decreasing sequence of events 

Hi, 1 < i < m such that F(H m \ Q m ) — > 1 as n — > oo, and that the following conditions hold: 

1. (trend hypothesis) When conditioning on Qi A Hi we have 

for all j € [r] and A € (^), where yf{t) and hj(t) are smooth non-negative functions such that 
x'j{t) = yf(t) - y~(t) and hj(t) = (e jXj + 7j )'(i); 

2. (boundedness hypothesis) For each j € [r], conditional on Qi A Hi we have 

< ^ 

3. (initial condition) for all j € [r] we have ej(0) = 7j(0) = 0; and Xj a(0) = SjXj(0) for all 

4- We have n 3e < s < m < n 2 , s > 40Cs 2 kjn £ , n 2e < Sj < n~ € s, 

mi0j(t) + ej{t)xj{t)/2 - 7j (t)/2 > c, 

roo 

sup \yf(t)\ < C, sup \x'At)\ < C, / \x"(t)\ dt < C, 

t>0 J t>0 Jo 

/•oo 

sup \hj(t)\ < n e , / \h'At)\ dt < n e . 

t>o Jo 

Then f(Q m A H rn ) — > 1 as n — > oo. 

Proof. On the event Qi A Hi we define 

Y±X ±2 (i) = ^fi(i) - (yf 1 (t) T 2 h (t)/A Sj )S /s. 

(Recall our convention that this is shorthand for 4 separate sequences of variables, one for each way 
of choosing signs for ±i and for ±2.) If any event Qi or Hi fails we define all Y^ ±2 (i') to be for 
%' > i. Define 

Z t,A 2 ii) = £ Y±^(i% Nj = -J^- and Vj = ACSj/s. 
i'=o i i 

Using the bounds < n e , sj > n 2e , |y^(i)| < C we see that Z^(i) and Z~^(i) are martingale 

pairs with parameter (ijj, Nj + rjj). For example Zj~j[(i + 1) — Z^\(%) = Y^ii) = Y^ A — {y^{t) — 
hj(t)/4sj)Sj/s is a submartingale by the trend hypothesis, is bounded above by Nj + n _e Sj/4s < 
Nj +rfj by the boundedness hypothesis and below by —CSj/s > —r]j. (The other cases are similar.) 
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Next we need the Euler-Maclaurin summation formula (see [5]), which is as follows. Suppose 
f(t) is a smooth function and a is a natural number. Then / = J'q f(i)di can be approximated by 
S = 1/(0) + /(l) + • • • + /(a - 1) + ±/(a) with error \S - I\ < £ di. We apply the formula 

to f(i) = x'At(i)) for any j £ [r] and a = i* with 1 < i* < m. Write t* = i* / s. Then 



a£(t(t)) dt = / xj-(r)s dr = s {xj{t*) - Xj (0)) 



and 



so 



\S-I\< 



- [ l \x"(t(i))\ di = f \ X 'Ut)\ dT < C, 

s Jo Jo 



,-(n-*i(o)-7£4(^)) 



We can rewrite this as 



i=0 



< 



+ 



+ 



t* 



\x"(t)\ dr < 



3C 

s 



- ^ x 'm)s 3 = ( Xj (t*) - Xj (o) ± —) s r 

s i=0 v s / 



(2) 



Similarly, our assumptions on hj and the initial conditions &j(0) = 7j(0) = give \ej(t*)xj(t*) + 
Jj(t*) — Ya=^o fy?(^(*))/ s l < 3n e /s, which we can rewrite as 



£ hj(t(i))/4sj ■ Sj/s = (e i (t*)x i (t*) +7j(t*) ± 3n e /s)S j /As j . 



(3) 



i=0 



Now we will estimate the probability that any event fails. We can restrict attention to events 
where all hold, as by assumption they all hold with high probability. Fix 1 < j < k, A G (H) , 1 < 
i* < m, t* = i* / s. Consider the event that i* is the first step at which Tii* holds but Qi* fails and that 
it fails for the variable Xj :A (i*). One possibility is that Xj :A (i*) > (l+ej(t*) / Sj)(xj(t*)+8j(t*) / Sj)Sj. 
By definition 



X jiA (f) - X jA (0) - £ x'(t(i))S 3 /s = £ (Y+ (i) - y+m/s - Yr A (i) + ^(t)^-/*) 



i=0 



8=0 



Z+J(i*) - Zr+(f*) + 2 £ hMW^j ■ Sj/s. 

i=0 



Applying equation ([2]) gives 



Z+J(i*)-Z-+(0 + 2 £ hjim/Asi-Si/s > (e ] (t*)x J (t*) + e 3 (t*) + e j (t*)e J (t*)/s ] -3C Sj / S )S j / Sr 



i=0 



Then equation ((HJ) , n 2e < Sj < n~ e s and 0,(i*) + ej(t*)xj(t*)/2 - jj(t*)/2 > c give 

Z+ A (i*) - Zr+(i*) > {ej(t*)xj(t*)/2 - jj(t*)/2 + 9j(t*) - (n e + 3Cs j )/s)S j /s :j > cSj/2 Sj . 
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We deduce that Zf^(i*) > cSj/Asj or ZT+(i*) < -cSj/Asj. Now we apply Lemmas ITU and I7T21 
with a = cSj/Asj, which is valid using our assumptions s > 40Cs 2 kjn e , Sj > n 2e and m > s which 
give rjj < Nj/10 and a < rjjm/W. We deduce that these events have probability at most 



n 



n 3fcj , 



exp(—(cSj/Asj) 2 /3rjjm(Nj + rjj)) < exp(— 5k j logn) <C 

say. A similar bound holds for the probability that Xj^ty*) < (1 — e j(t*)/ s j)( x j(t*) ~ c / s j)Sji when 
we have Z~^(i*) > cSj/Asj or Z^\{i*) < —cSj/4sj. Taking a union bound over 1 < j < r, A G (^) 
and 1 < i* < m completes the proof. □ 



8 Trackable variables 

To apply Lemma [7.31 to the extension variables X^^j^ii)-, we need to estimate the expected and 
maximum number of extensions that may be created or destroyed in each step of the process. In this 
section we establish a bound on the maximum number of extensions created or destroyed; in other 
words, we verify the boundedness hypothesis. Also, in anticipation of the expected change calcula- 
tions needed for the trend hypothesis, we show that two types of pathological subgraph configurations 
that could potentially spoil these calculations are suitably rare. More specifically, we show that, on 
the event Qi, there are very few extensions in 2^ jr that contain a pair of open pairs e, / such that 
the inclusion of one as an edge causes the other to become closed, and very few extensions in 
for which there are two edges in <fi(Er \ Ej) that can both be closed by the addition of the same 
edge ej+i. We stress that we obtain these bounds whenever the variable is trackable (as defined in 
Subsection 1 1 . 2[) . In particular, this condition holds for the extension variables that track the open 
routes to H less an edge, the central variables in the proof of Theorem 11.41 

We begin with a technical lemma that amounts to showing that if Xa> jt is trackable then there 
are no 'implicitly' closed edges in E-p \ Ej. 

Lemma 8.1 If Xj, j t r(i) is a trackable variable and uv G E-p \ Ej then there does not exist C C Vh 
with an injective embedding ij) : C — > Vp such that 

1. ip(H[C]) is a subgraph of the graph V = T U (^^(E^)) D (^)J obtained from T by adding the 
edges ah for all a,b E A with <j){a)(j)(b) £ E{i), 

2. for any vertex v £ C with ip(v) A, every neighbour of v in H belongs to C, and 

3. there is some edge e in H[C] with ip(e) = uv. 

Proof. Assume for a contradiction that ip is an embedding satisfying conditions (1-3) of the lemma. 
Define A' = {v £ C : ip(v) 6 ^4}. We claim that \A'\ > 2. This is clear if H contains an edge e with 
■ip(e) C A. Otherwise, condition (1) implies that C ^ Vh, as H is not a subgraph of T by definition 
of trackability. Then condition (2) implies that A' disconnects H, and since H is 2-connected we 
deduce that \A'\ > 2. 
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Now let K be the graph obtained from H[C] by deleting all edges inside A'. Now K is isomorphic 
to a subgraph of V by condition (1), so S a ^[Avmp{C)\ < SU'.if • Also, SU'.if = n |c| ~ |A ' l p e » ( ^' iC '\ A ' ) is 
equal to S(y H \c)uA',H by condition (2). This in turn is at most 1, as H is strictly balanced. We 
deduce that Sa,t\Au^(C)] < 1- 

Note also that ip{C) is not contained in A, as by condition (3) it contains the edge ip(e) = uv of 
r. This rules out the possibility that [A, F) is strictly dense, so it remains to consider possibility (b) 
in the definition of trackability. In this case we must have S^ruu^fci] = 1> an d so S(Vn\C)uA',H = 1) 
when the fact that H is strictly balanced implies that C = Vh, \A'\ = 2 and A' G Eh- However, the 
existence of such an embedding of H in T' is specifically ruled out by the definition of trackability, 
so we have the required contradiction. □ 

Now we are ready to verify the boundedness hypothesis. Following the notation of Lemma 17.31 
we write X^j^(i + 1) — X^j^i) = Y^ JT {i) — Y7 jr (i), where Yj~ jr (i) > is the number of maps 
/ in £<j>,j,T(i + 1) \ E<j>,J,r(i) and Y^ JV (i) > is the number of maps / in E<j,,j,r{i) \ ^,J,r(i + 1)- 
Recall that / : Vp — > [n] is counted by X^j^ii) if /(e) G 0(i) for every e G Er \ Ej, /(e) G E'(i) 
for every e € £7j, and / restricts to eft on A. Then / will be counted by Y^~j V {i) if there is at least 
one e £ Ey\ Ej such that /(e) either becomes closed at step i + 1 or is the edge e^+i chosen by the 
process at step i + 1. Also, for each edge e of J and / counted by X^n^ii), f might be counted by 
Y^ jr (i) if ej+i = /(e). (We will see below that / may not actually be counted, but for the purpose 
of an upper bound we do not need to take this into account here.) 

Lemma 8.2 (Boundedness hypothesis) With high probability, for every 1 < i < m, assuming 
Qi and that X^j^{i) is trackable, we have Y^j r (i) < n~ l l eH Sa,j and Y^j r (i) < n _1 / ej/ Sa,j ■ 

Proof. We start with the variable Ytjj,{i). Fix an edge e = ab of J and suppose the process 
chooses the edge ej+i = uv in step i + 1. Let A' = A U {a, b}, J' = J\ En A n and define <f>' : A' — > [n] 
agreeing with <p on A and satisfying 4>'(a) = u, 4>'(b) = v. Note that one of a or b may belong to 
A, but not both, as A is independent in J. Any / counted by Y+~j r (i) with /(a) = u and f(b) = v 
is counted by ^',j',r(0i we can bound this by N,p> t j>, which by Lemma [5.21 is at most N^ t ji < 
n 4e ' ,e maxA'CBCVj, Sb,J'- Since A C A' and (A, J) is strictly dense we have maxA'CBcVj, Sb,j> < 
n~i/(, e H—l) Sa,j- Summing over all edges e of J we estimate Y^ JT {i) < n~ x l eH S a,j ■ 

Now consider the variable Y7 JV {i). Suppose the process chooses the edge e^+i = uv in step i + 
Fix an edge e of T \ J. We want to estimate the number of embeddings / in jr(i) for which /(e) 
is either equal to e^+i or becomes closed in step i + 1. Since (A, J) is strictly dense, Lemma IST31 gives 
an upper bound of n~ 1 ^ eH ~ l ^ +A<yej+l ^ e S a,j on the number of embeddings / with /(e) = ej+i. 

Next consider an embedding / where /(e) = xy becomes closed in step i + Then there is an 
embedding f 2 of H in G(i) U {uv, xy}. Write C = f(Vj) n /2(Vjf ) and identify the sets / _1 (C") and 
/ 2 _1 (C") as a set C on which / and /2 agree. Then we have /2(a) = u, /2(b) = v for some a, b G Vff, 
and we have some c,d G C with /(c) = /2(c) = x, /(d) = /2(d) = y, where {c,d} / {a, 6} and 
{c, d} <^ A (since ^4 is independent in F). Write H' = H \ {ab, cd} and let W be the join of J\ = J 
and J2 = H' formed by identifying vertices in C and removing any edges within A' = A U {a, b}. 
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We claim that Sb,w — n 1 '^ eH 1 ^Sa,j for all A' C B C W. Fix such a set 1? and write 
B X =BC\ V Jx and B 2 = BC\Vj 2 . We have 

S^w = Sb 1 ,j ■ Sb 2 uc,h • P 13 

where (3 is the number of edges in J 2 joining B 2 \C and C\B 2 . Since (A, J) is strictly dense we have 
Sbi,j < Sa,j, with equality only if B\ = A. Furthermore, since {a, b} U C has at least 3 vertices, 
we have Scub 2 ,h < 1) with equality only ii C U B 2 = Vh- Thus we can restrict our attention to the 
situation where B\ = A, B 2 D Vj 2 \ Vjj and (3 = 0. In this case we will use Lemma 18.11 to obtain 
a contradiction. We view C as a subset of Vh and let ^ be the identification of C with the subset 
of Vp which is also called C. We can assume that condition (1) is satisfied, as otherwise / is an 
extension of 4> to an embedding of a supergraph of J and then we have the required estimate on 
Sb,w by Lemma E3j Also, (3 = gives condition (2), and f 2 (cd) = xy = /(e) with e G \ Ej and 
c,d G C, which gives condition (3). Thus Lemma l8.ll shows that this case does not actually arise. 
We deduce that S B ,w < n -1 /^" 1 ) S A ,j- 

Now applying Lemma f5.2l and summing over all possibilities for e and W gives the required bound 

Now we turn to two technical issues regarding the expected values of Y^ jr (i) and Y7 J7 {i). We 
would like to approximate these using our estimates for extension variables. In the case of Y^j^{i), 
our first approximation is that for each edge e of J, an embedding / counted by ■X'^./wrC*) should 
be counted by Y^ jr (i) if e^+i = /(e). However, we need to account for the possibility that the 
addition of the edge e^+i = /(e) closes some edge f(e') where e' G \ Ej. In the case of Y7 jr , 
we sum Cf( uv )(i) over uv € Er \ Ej to estimate the number of open edges xy such that choosing 
e,; + i = xy causes a given embedding / in to leave this set. However, we need to account for 

the possibility that there could be edges uv,u'v' G Ep \ Ej such that Cff uv \(i) and Cff u > v /\(i) have 
large intersection. We now establish two lemmas showing that these two 'pathological' possibilities 
have a negligible impact. 

Lemma 8.3 (Creation fidelity) IfX^j^ is a trackable variable then, with high probability on the 
event Qi, the number of extensions f G E^ jr with the property that there are distinct uv, xy G Et\Ej 
such that G(i) U {f (uv) , f (xy)} contains a copy of H is at most n _1 / eH Sa,j ■ 

Proof. Let uv, xy G £r \ Ej be distinct and fixed. Consider any graph W given by the join J and 
a copy of H less two edges, where uv and xy are identified with these missing edges. As in Lemma 
18.21 it suffices to show that Sb,w — n ~ 1 ^ eH ~ 1 ' > Sa,j for all A C B C V\y- The argument is almost 
identical to that in Lemma l8?2l With the same notation we again have Sb,w = Sbi,j ■ Sb 2 uc,h • i - 
We again have Sb x ,j < Sa,j, with equality only if B\ = A. Furthermore, in the current lemma we 
have u, v,x,y G C, so \C\ > 3, and Scub 2 ,h < 1, with equality only if CL)B 2 = Vh- Then Lemma I87T1 
applies as before to complete the proof. □ 

Lemma 8.4 (Destruction fidelity) If uv,u'v' G 0(i) are distinct then, on Qi, we have \C uv (i) n 
C u 'v'(i)\ < n~ 1 / eH p~ 1 with high probability. 
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Proof. Let ab and cd be distinct edges of H and set H± = H \ {ab,cd}. Similarly, let a'b' and c'd' 
be distinct edges of H and set H2 = H \ {a'b' , c'd'}. Now let W be any join of Hi and H2 where 
c = d and d = a" but ab ^ a'b'. Set A = {a, b} U {a' , b'}. Then \A\ > 3. Appealing to Lemma [531 
it suffices to show Sb,w < P -1 f°r all A C i? C W. Fix such a set B. Similarly to before we have 
Sb,w < Sb 1 ,h 1 Scub 2 ,h 2 P 132 , where Bi = Bfl B2 = B D Vh 2 , and C = Vffj fl Vh 2 and ^2 is the 
number of edges in H2 joining B%\C and C \ B%- 

Note that c,d € C, so Scub 2 ,h 2 = Scub 2 ,h < lj with equality only when CUB2 = Vfj. Also, 
since Hi is strictly dense we have Sb 1 ,h 1 < l/p> with equality only when l?i = {a, b}. Thus we obtain 
the desired inequality Sb,w < P , except possibly in the case when C U B2 = Vh, Bi = {a, b} and 
/?2 = 0. Also, the same argument reversing the roles of Hi and H2 shows that we obtain the desired 
inequality, except possibly in the case when C U Bi = Vh, B2 = {a',b'} and j3\ = 0, where fli is 
the number of edges in Hi joining B\\C and C \Bi. Since H is 2-connected, the only remaining 
possibility is when Vh x = Vh 2 - But then Sb,w < Sa,H! < Vp> as ^1 ^ s strictly dense and |A| > 3. 
Thus in all cases we have the desired inequality. □ 

9 Trajectory verification and Turan bounds 

Now we use the above bounds and Lemma 17,31 to prove Theorem 11.41 which shows that trackable 
extension variables are well described by the differential equations given earlier in the paper. It will 
then follow that the process does indeed continue until at least time t = i max = A* (log 
m = /i(logre) 1 / ( - eff ^ 1 ^pn 2 edges. In particular, it will follow that variables counting common neigh- 
bours of cf-sets with p d n > 1 and variables counting extensions from non-edge pairs to subgraphs of 
H with at most ejj — 2 edges satisfy these equations. Then Corollary 11.51 is an immediate conse- 
quence of the formulae for common neighbours. In particular, when d = 1 we deduce the minimum 
degree statement needed to prove Theorem 11.11 To prove Theorem 11.11 we will show that the good 
event Q m holds with high probability, i.e. for every i < m and trackable extension variable 
corresponding to a triple in T, we have 

X^vii) = (1 ± e(t)/s e )(x AiJ , r (t) ± 6(t)/s e )S A j, 

where XA,j,r(t) = q(t) er ~ ej (2t) ej and t,s e ,SA,j,q(t),e(t),0(t) are as defined in Subsection 11.21 

Proof of Theorem 11.41 To apply Lemma 17.31 we arbitrarily number the triples in T by 1 < j < r 
and identify the extension variables X^ j^i) with the variables Xj t A(i) appearing in the statement 
of the lemma. We take &j(t) = e(t) and 0j(t) = 6(t) for all 1 < j < r. The event Hi is the event 
that the estimates given in Lemmas 16.11 18.21 18.31 and 18.41 hold up to step i. We will give values for 
the other parameters of the lemma later in this proof. 

We start with the main step, which is checking the trend hypothesis. For the expected one-step 
changes E[Y^ l= j r (i)|^j A TCi] we analyse the error terms in our earlier heuristic derivation. We start 
with the variable Q(i), which counts the number of ordered pairs that are open at step i. Write 
Q(i + 1) - Q(i) = Q + (i) - Q-(i) with Q + {i),Q~(i) > 0. Since Q(i + 1) = Q(i) - 1 - |C ei+1 (i)| we 
have Q + {i) = and Q~{i) = 1 + |C 6i+1 (i)|. Then Corollary gives 

Q-(i) = 1 + (1 ± 2e(t)/s e )(a H (2t) eH ' 2 q(t) ± 6(t)/s e )p- 1 . 
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We have q'(t) = y+(t) - y~{t), where y+(t) = for all t and y~(t) = c(t) = a H (2t) eH - 2 q(t). We also 
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have h q {t) = (eg + 7 )'(i). Now e'{t) = P'(t)e p ^ > W{t e *~ 2 + l)e p W and q'(t)/q(t) = -a H {2tf H - 
so since W » V > e H we have h q (t)/y~(t) > (V + iya^ 1 (2t)-( eff - 2 ))e p ( i ) for t > 0. Since s = pn 2 
and #(t) < 1 we easily have the required condition for Q~(i), namely 



Q~{i) = (y q {t) ± /i g (0/4s e )n 2 /s- 

(We only need this estimate for K(Q~ A TCi), but actually it always holds on the event Qi.) 

Now we check the trend hypothesis in the general case. We write X^^j^ii + 1) — X^^j^ii) = 
Y", j r (i) — Y^ JT (i). The term Y^ JV {i) has contributions corresponding to each edge e of J. A 
function / in S^ j\ e r (i) will be counted by Y^~ jr (i + 1) if the process chooses the edge ej+i equal to 
/(e) and this choice of ej+i does not close any edge in f(Er \ Ej). Now ej+i is chosen uniformly at 
random among Q(i)/2 open edges, so appealing to Lemma 18.31 we can estimate 

nY^ r (i)\Gi A Hi) = 2Q(i)- 1 (^,Ae.r(») ± n~ 1/ejr £A,J\e, 

egj 

Now X^j^rii) = (l±e{t)/s e )(x AiJ \ etT (t) ±6(t)/s e )S AiJ \ e . Since S A)J \ e =p~ 1 S AyJl n" 1 / 6 " < l/s e 
and 0(t) > 1/2 for t > we estimate E(Y^ Jr (i)|& A Hi) as 

2((1 ± e(t)/* e )(?(t) ± e{t)/s e )n*)- x ■ ej ■ (1 ± e(t)/ Se )(g(t) er - ej+1 (2t) ej - 1 ± 2^)/s e )p- 1 ,S A „ 7 . 

We have ^ Jr (i) = j/£ J>r (t) " ^,J,r(*) ; where y% JtT {t) = 2ejq(t) e ^ (2t)^~ l and ^^(t) = 
a H (er-ej)q(t) er - ej (2t) ej+eH ~ 2 . We also have h A}J} r\i) = (eacA,J ) r+7)'(*)- To establish the required 
bound, i.e. 

E (*£j.r(»)lft A w <) = (yl,j,r(*) ± hA,j,T(t)/48 e )SA,j/8, 
it suffices to show that 

(1 ± 4e(t)/s e )(l ± 2d(t) q (t)- 1 /s e )(l ± 20(t)( (7 (t) e r-^+ 1 (2t) e ^ 1 )- 1 /s e ) (4) 
C 1 ± (2e J( ?(t) er -^(2t)^- 1 )- 1 / lAiJi r(t)/4 Se . 
Setting x(t) = x Aj j t r(t) = (2t) ej q(t) er ~ ej we see that it is necessary to establish that 

4eje(t)x(t) 2ej0(t)x(t) 4ej6{t) 

t + tq(t) + q(t) [ } 

is bounded above by 

l -(x{t)e'(t)+ X '(t)e(t)+i(t)) 

. f^W + „ + M - (er - «,,.„<*)«-») + 2|1 

4 V 2 / 4 

Note that establishing this bound is in fact sufficient. To see this we observe that our choice of j(t) 
ensures that h A ,j,r(t) is bounded below by some constant (which is a function of V). Therefore the 
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terms omitted in are 0(1/ s e ) = o(l), so do not cause the inequality to be violated when n is 
sufficiently large. Note also that we can assume that ej > 0, as otherwise Y^j-p = y\ Jr = 0. To 
verify the bound for t < 40V/W we note that x(t) < 9t/4, as ej > 0, and therefore ([5]) is at most 
9Ve(t) + 15V < 10Ve mv < j'(t)/A. On the other hand, for t > 40V/W we note that the first two 
terms in @ can each be bounded by We p ^x(t)/W; the remaining term is bounded by j'(t)/A > 5V 
for AOV/W < t < 1/(50V) and by e p ^x(t) for larger t. 

Next consider the term Y^j r (i), which has contributions corresponding to each edge e of T \ J. 
A function / in S^jj^i) will be counted by Y^ JT (i + 1) if the process either chooses the edge ej+i 
equal to /(e) or /(e) becomes closed, i.e. /(e) £ C(i + 1). Thinking of ej+i as an ordered pair, the 
number of choices is 2 + \Cfi e \(i)\, each occurring with probability Q(i) . Therefore, appealing to 
Lemma [831 we have 

Efy-^^I^AH^Qtr 1 E E ( 2 +|C/(e)(0l±n- 1 / efl p- 1 ). 

fes^j, r (i) eer\j 

We can estimate \Cf^(i)\ by Corollary 16.21 so we estimate K(Y7 jr (i)\Gi A Hi) as 

((1 ± e(t)/s e )(q(t) ± OW/sJn 2 )- 1 ■ (e r - ej) • (1 ± e(t)/ Se )(^) er - e ''(2^ ± 9(t)/8 e )S A ,j 
■ (1 ± e(t)/s e ± n- 1 /eH )(aH(2t) eH-2^ t) ± 20( t y 8e ) p -\ 

Now to establish the required bound, i.e. 

E^rCOIft A Wi) = (j6 iJ>r (t) ± hA,j?(t)/*Be)SA,j/8, 
it suffices to show that 

(1 ± 4e(t)/s e )(l ± 29(t)q(t)- 1 /s e )(l ± ff(t)(?(t) er - ej (24) ej )" 1 /*e)(l ± 20(t)(a H (2t) e "- 2 g (*)r7* e ) 
C 1 ± (a^(e r - e J )g(t) er -^(2t) e ^ + ^- 2 )- 1 / iA , J , r (t)/4 Se . 
And this reduces to showing that 

4a H (er - ej )(2t)— x( t)e(t ) + ^ - !£>ggI^W) 

+ a,(e r - e j)( 2t)-^) + 2 i^^») 

is bounded above by 

e^)x(t) (W fH _ 2 \ i(t) 



4 V 2 / 4 ' 

This follows by estimates very similar to those given above for Y^j r (i). We omit the details, except 
for remarking that is helpful to observe that the term Off(er — ej)(2t) £H ~ 2 is bounded by 7'(i)/4 for 
t < 1/(50V). 

This verifies the trend hypothesis of Lemma 17.31 To finish the proof we check the remaining 
conditions. The boundedness hypothesis follows from Lemma [8.2l as we have n l / eH > n 1/eH ~ e = s 2 e n £ . 
We have \T\ = r < V 3V , n 2£ < s 2 n £ < n < pn 2 = s < m < n 2 and s e = n 1 / 2eH ~ £ > n 2€ . The 
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functions XA,j,v(t) and y A Jr (t) all have the form F(t)e~ KteR 1 , where F is a polynomial of degree 
at most V + en, and K and all coefficients in F are non- negative and bounded above by W, say. 
Here we can use 

/ t a e -* = a! and supt a e"* = (a/e) a for a E N 
Jo t>o 

to see that sup i>0 \y A j r (i)|, sup t>0 \x' A Jr (t)\ and J °° \x" A j r (t)\ dt are all bounded by some constant 
C depending only on W. Also, recall that e(i) = e p(t) - 1 with P(t) = W^- 1 + t), h A ,j,r(t) = 
(exA,j,r + 7)'(*)j and 7(i) is a smooth increasing function such that -y(t) and "y'(t) are bounded by 
absolute constants. The initial conditions e(0) = 7(0) = hold. Since t < t* = ^(logn) 1 /^" 1 ), by 
choosing \i sufficiently small we can ensure that sup t>0 \h A j^(t)\ < n e and J °° \h' AJT (t)\ dt < n e . 
Finally we can choose c = 1/2, since 6(t) = 1/2 + -y(t), so 9{t) + e(t)x(t)/2 - -y{t)/2 > 1/2. □ 



10 Counting small subgraphs 

In this short section we apply our results to count small subgraphs in the H-iree process and compare 
these counts to those known for the G(n,p) model. A rough summary is that the H-free process 
looks very much like G(n,p) from this perspective, except that it does not contain any graphs that 
contain H. A more precise description is given by Theorem 11.6} which we now prove. 

Proof of Theorem 11.61 Statement (i) follows from Lemma 14.11 as T[B] does not appear in G(i) 
with high probability, and therefore T itself does not appear with high probability (note that the 
failure probability here decays polynomially in n, not exponentially). Statement (ii) follows from 
Theorem 11.41 applied to the trackable variable Xr(i) = A A rr (i). It remains to consider the case 
when iSrra] — 1 f° r all B C Vr- Form the extension series = Bq C Bi C • • • C Bj = Vp, as defined 
in Section O We divide the m steps of the process into d equal intervals, and in the jth interval 
we show that with high probability there is an extension from a fixed copy of r[Bj_x] (found in the 
previous interval) to a copy of r[5j]. By construction every step of the extension series is strictly 
balanced, and our assumption in this case implies that the scalings in each step satisfy 5 , s J _ li r[B J ] — !• 
Suppose that 4> : -Bj-i — > [n] is an embedding of in G((j — l)m/d). If S , B _ 1) r[B ] > 1 then 

the variable -X'<^.r[Sj],r[Bj](*) i s trackable, so the required extension exists by Theorem 11.41 (in fact 
there are many such extensions). On the other hand, if j = 1 we can apply Theorem 11.41 

to the trackable variables X^ T[Bj] \ e:T[Bj] (i) with e G E r[Bj] \ E^.^y Writing aj = e r[Bj+l] - e r[Bj] 
we can estimate the probability that in step i the edge ei completes some embedding of T[Bj] \ e 
for some e to an embedding of T[Bj] by Q{i)~ l Yle-^-4>,T[Bj}\e,T\Bj\{i) ~ aj(2t) a i~ 1 /(pn 2 ). Since the 
length of each interval is m/tf > s = pn 2 and t > 1 (ignoring the first half of the first interval, say) 
we see that the required extension appears with high probability. □ 

Remark. Our results for counting labelled copies of T in the H-iree process mirror 
those obtained for the analogous counts in G(n,p). However, rather more is known in 
the G(n,p) model, some of which is surveyed in Section VII of |27] . In the supercritical 
case Barbour, Karohski and Rucihski [6] gave a central limit theorem with estimates on 
the rate of convergence for the appropriately normalised count. Spencer [32] analysed 
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the critical case: one of his results concerns the case when T is strictly balanced, when 
he obtains the asymptotic probability for T to appear when p is near the threshold. It 
seems plausible that similar results may hold for the i/-free process: in the supercritical 
case one would need to extract distributional information from the differential equations 
method (along the lines of [29]), and in the critical case one would need a more accurate 
analysis of the above proof (which seems to suggest a Poisson approximation). For the 
sake of brevity we do not pursue these possibilities here. 



11 Smooth independence 

We have now shown that the H-free process continues until at least the time t max = /i(log n) 1 /^ -1 ), 
when it has m = /x(log n) 1//( - 6H_1 ^n 2 edges. In this section we describe an additional assumption 
('smooth independence') on H, under which we show that the independence number of the resulting 
graph is at most 

a = 3/x- 1 (logn) 1 - 1 /( e «~ 1 V 1 - 

Since the independence number cannot increase when more edges are added, we also have the same 
upper bound for the terminal graph of the process. The main step of our proof will be to show 
that, for any set / of size a, with high probability we can track the number of open pairs contained 
within /: at time t there will be roughly q(t)\I\ 2 open ordered pairs in /. Then a simple union bound 
calculation will show that with high probability / is not independent at time i max - 

To track the open pairs within a set I we use Lemma [7.3| but we cannot simply apply the lemma 
directly, due to the possibility of closing a large number of pairs in / in a single step of the process. 
Note that in this application of Lemma 17.31 we will take fa = a and Sj = a 2 . So we will not be able 
to achieve the boundedness hypothesis in a useful way if we allow our process to close a edges in 
the set 7 in a single step (and this certainly is a possibility for many choices of H). To deal with 
this, we say that the edge added in step i is I-good if it closes at most n~ 5e p _1 ordered pairs in 
/, otherwise is I-bad. Then we say that a pair uv in / is I-closed at step i if there is some step 
i! < % such that e^ is /-good and G(i') U {uv} contains a copy of H. If uv in / is not in E{i) and 
not I-closed we say that it is I-open at step i. Note that an /-closed pair is closed, but an I-open 
pair could be open or closed (but not an edge). Let Qi{i) be the number of open ordered pairs in I 
at step i and Xj(i) be the number of I-open ordered pairs in I at step i. We write Pj C E(m) for 
the set of ordered edges at time i max that are /-bad. Then we say that H has smooth independence 
if with high probability \Pj\ < n~ 5€ p~ 1 for every set I of size a. 

Our first step is to apply Lemma 17.31 to track the number of /-open pairs in /. 

Lemma 11.1 If H has smooth independence, then with high probability, for any set I of size a, the 
number of I-open ordered pairs in I at step i is Xj(i) = (1 ± e(t)n~ 2e )(q(t) ± n~ 2e )a 2 . 

Proof. We apply Lemma O with r = 1, fa = a, X X j{i) = X^i) for / G ( [ ™ ] ), x x (t) = q(t), 
e x (t) = e(t), 71(f) = j(t), 6>i(t) = 9(t), s x = n 2e and Si = a 2 . We let Hi be the event that the 
estimates given by Theorem 11.41 hold up to step i and that \Pj\ < n~ 5e p~ 1 for every set / of size a. 
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The main step is verifying the trend hypothesis of Lemma 17.31 Note that adding an edge cannot 
create any new /-open pairs, so we always have Y 1 j(i) = 0. Now we calculate the expected one-step 
change E^^t/i A Hi). Recall that a pair e becomes closed at step i + 1 if the process chooses the 
edge ej+i in C e (i) so a pair e in / becomes /-closed if is /-open and ej+i is chosen in C e (i) \Pj. Also, 
if e in / is open as well as /-open it may become an edge if the process chooses e^+i = e. Now e^+i 
is chosen uniformly among Q(i) open ordered pairs at step i, so 

E{Y-j\gi A 7U) = Q(i)~ 1 Yl (\ C ^ i )\ P l\ ±1 )- 

eeXj(i) 

(Here we also wrote Xj(i) for the set of /-open pairs in /.) Temporarily ignoring the error terms, this 
suggests the equation x[(t) = — q(t)~ 1 xi(t)c(t), which has q(t) as a solution, explaining our choice of 
xi(t) above. To account for the error terms, we estimate Q(i) by Theorem ll.4l C e (i) by Corollary l6.21 
Xj{i) by the fact that we are conditioning on Qi (interpreted for the current application of Lemma 
I7.3P and Pj by definition of the event Hi. Thus we estimate E(Y^|£j A Hi) as 

((1 ± e(t)/s e )(q(t) ± Oityajn 2 )- 1 ■ (1 ± e(t)/ Sl )(q(t) ± 9(t)/ Sl )a 2 

•(1 ± 2e{t)/s e ){a H {2t) e »- 2 q{t) ± 9(t)/s e ± n" 5 ')?" 1 . 

Recalling that s = pn 2 , y7(t) = an {2t) eH ~ 2 q(t) and h q (t) = (eq + j)'(t) we see that we have the 
required condition 

!(*i7j|ft A Hi) = (y-(t) ± h q (t)/A Sl )a 2 /s. 

The boundedness hypothesis follows immediately from the definition of /-open pairs. Note that 
we can arrange for s\k\rf = an 5e < n, since e is small. The remaining conditions of Lemma 17.31 
follow by similar calculations as in the proof of Theorem 11.41 □ 

Next we show that a similar estimate holds for the number of open pairs in /. 

Lemma 11.2 If H has smooth independence, then with high probability, for every set I of size a, 
the number of open ordered pairs in I at step i is Qi(i) = (1 ± e(t)n~ 2e )(q(t) ± 2n~ 2t )a 2 . 

Proof. We need to estimate the number of ordered pairs in / that are /-open but not open. By 
Corollary 16.21 we can bound the number of pairs closed by any edge by p~ 1 logn (say). By smooth 
independence we can assume that \Pj\ < n _5<E p _1 , so at most n _5e p _1 -p^ 1 logn pairs in / are closed 
but /-open. The required bound follows from these estimates and Lemma 1 11. 11 □ 

Finally, we can show that the independence number of the process at time m is at most a. 

Lemma 11.3 // H has smooth independence, then with high probability, at time m every set I of 
size a contains at least one edge. 

Proof. At step % + 1 the process chooses an edge uniformly at random from one of the Q(i) open 
ordered pairs. Since Qi(i) of these belong to /, it fails to choose an edge in / with probability 
1 — Qi(i)/Q(i). Multiplying these probabilities and taking a union bound over / we can bound the 
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probability that there is an independent set I of size a by p a = (™) max/ nS=i(l ~~ Qi(i)/Q(i))- By 
Theorem 11.41 and Lemma 111.21 we have 

Ql(i)/Q(i) = ((1 ± e(t)/s e )(q(t) ± e(t)/s e )n 2 )-\l ± e(i)n- 2t )(<z(i) ± 2n~ 2 > 2 . 

Recalling that s e = n 1//2eH_e and /z is chosen small enough that and e[t) are at most n e for 

t < ^max we can estimate Qi(i)/Q{i) = (1 ± 10n _e )(a/n) 2 . Therefore 

log Pa = a(logn - log a + 1 + 0(l/n)) - m((l ± 10n" e )(a/n) 2 ± 2(a/n) 4 ). 

Also, since m = //(log n) 1 ^ eH ~ 1 ^pn 2 and a = 3// _1 (log n) 1 ^ 1 ^ 6 ^ -1 ^ -1 we have m(a/n) 2 = 3alogn. 
Thus we obtain 

logp a < -alogn = -3//~ 1 (logrt) 2 ~ 1/(eH ~ 1 V~ 1 ) 
so p a < exp(— n l l eH ) (say), as required. □ 

12 Independence number and Ramsey bounds 

In this section we show that cliques and cycles both have the smooth independence property. By 
Lemma fll.31 this is enough to prove Theorems 11.81 and 11.91 and then Theorem 11.21 follows immediately 
from Theorem 11.81 We will also show that a graph H satisfying the hypothesis of Theorem 11.71 has 
smooth independence, which is enough to prove that theorem. 

We start with cycles, where we deduce smooth independence from a path-counting argument. 
Lemma 12.1 The i-cycle Cg has smooth independence for t > 4. 

Proof. Suppose I C [n] is a set of a vertices and let Pj C E(m) be the ordered edges at time i max 
that are /-bad. We need to show that with high probability \Pj\ < n _5e p _1 for all such /. Consider 
the contrary event that \Pj\ > n~ 5e p~ , i.e. there are at least n~ 5e p~ 1 ordered edges that each close 
at least n~ 5e p~ 1 ordered pairs in /. Then there is some ordered pair of edges uv, xy of Ci and 
Pj C Pj with \Pj\ > l~ l n~^ e p~ l such that for every edge cd in Pj there are at least £~ 1 n~ 5e p~ 1 
embeddings / of \ uv with f(x) = c, f(y) = d and f(u), f(v) G /. 

Set Iq = I and for 1 < j < £ — 2 define 

Ij = {v : \N G{m) {v) n > n- We pn}. 

By Theorem [T3] the degree of any vertex at time t is (lrbe(t) / s e )(2tH/ s e )pn. Now p = n - ^" 2 )/^ -1 ) 
and t < t max = //(logn) 1 /^ -1 ), so pn = n 1 /^ -1 ) and we can bound all degrees by (nlogn) 1 /^ -1 ). It 
follows that there are at most (nlogn) jf /^~ 1 ) paths of length j starting at any given vertex, for any 
j. Also, if v ^ Ij we can improve on this estimate when counting paths of length j that start at v 
and end in /. To see this, consider choosing the vertex sequence of such a path starting at v, say 
v = Vj-i, ■ ■ ■ ,vq G /. At each step we have at most (nlogn) 1 /^ -1 ) choices, and there must be some 
j — 1 > j' > 1 where Vj> ^ Ij' but Vj/—i S Ij'—i, when by definition we have at most n~ 10e pn choices. 
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This gives at most j(n 10e pn)((n log n)^ ^) < n 9e n?>™ ^ paths of length j that start at v 

and end in /. 

Suppose without loss of generality that removing uv and xy from the cycle leaves a path of length 
l\ joining u to x and a path of length £2 joining v to y, with £\ + £2 = £ — 2 and £\ > (we might 
have £2 = 0, i.e. v = y). We claim that for any edge cd in Pj we must have c E and d E Ii 2 . 
For suppose that c £ 1^. Then there are at most n -9 *^ 1 /^ -1 ) paths of length £\ that start at c and 
end in I. Also, there are at most (n log n)^ 2 /^" 1 ) paths of length ^2 that start at d and end in I. 
Thus we bound the number of embeddings / of Ci \ uv with f(x) = c, f(y) = d and f(u), f(v) E / 
by 77,~ 9e 7^i/(^ -1 ) . (nlogn)^ 2 /^ -1 ) < £~ 1 n~ 5e p~ 1 , contradiction. Thus we have c E i^, and the same 
argument gives d E Ii 2 . 

Now by Lemma l4.3t with high probability we have \Ij\ < a(8 _1 en~ 10e pn)~ jf < n 1 ~^ +1 ^ 1 / (^ _1 )+ 11 J , ' e 
for 1 < j < I — 2 and every / of size a. Then by Lemma 14.21 with high probability we have 

e^hJk) < max{4e _1 (|^ 1 | + \h 2 \),p\I tl \\h 2 \n 2e }. 

This is less than n~ l ^p~ l unless £2 = 0. Also, if £2 = then £1 = ^ — 2, so < n 1Ue and we 
can bound the number of edges incident to l£ 1 by |/£ 1 |(nlogn) 1 /^ -1 -' < - re 1 /(^^ 1 )+ 12 ^ e . Either way we 
have e(/4, h 2 ) < ^n^p- 1 < |P}|, by our earlier assumption, which contradicts the fact any edge 
cd in has c £ and d G Ie 2 - Therefore with high probability we have |P/| < n~ 5e p^ 1 for all /, 
i.e. H has the smooth independence property. □ 

For cliques, we first consider the case H = K s for some s > 6. Then p = n~ 2 ^ s+1 \ Consider 
any two edges uv, xy of H and let H~ = H \ uv. We have S xy ^- = P and for s > 6 we have 
Sxy,H-[B] > P 2 " 11 > P f° r an Y ^ with xy C B C Vh, i.e. (xy,H~) is strictly balanced. We show 
that this more general property suffices for smooth independence. Note that if H is any graph such 
that (xy, H~) is strictly balanced for all xy,uv E Eh then H has minimum degree at least 3. (To see 
this, assume for a contradiction that dn(u) = 2 and consider an extension (xy, H~) where u E" xy.) 

Lemma 12.2 Suppose that (xy, H \ uv) is strictly balanced for any two edges uv, xy of H. Then H 
has smooth independence. 

Proof. Suppose / C [n] is a set of a vertices and let Pj C E(m) be the ordered edges at time i max 
that are /-bad. We need to show that with high probability \Pj\ < n^^p^ 1 for all such I. Consider 
the contrary event that \Pj\ > n^^p^ 1 , i.e. there are at least n _5e p _1 ordered edges that each close 
at least n~ 5e p~ 1 ordered pairs in I. Then there is some ordered pair of ordered edges uv, xy of H 
with u ^ {x,y} and P[ C Pj with \Pj\ > (2en)~ 1 n~ 5e p~ 1 such that for every edge cd in P[ there are 
at least (2e#)~ 1 n~ 5,: p~ 1 embeddings / of H~ = H \ uv with f(x) = c, f(y) = d and f(u), f(v) E I. 

Write H~ = H \ uv. Since (xy,H~) is strictly balanced we have h~ < 1 101 an Y B with 
xyu C B C Vh- Applying Lemma 15.21 we see that for any a,c,d E [n] there are at most n He 
embeddings / of H~ = H \uv with f(x) = c, f(y) = d and f(u) = a. For each edge cd E P[ let 
U C( i be the set of vertices a E / such that there is at least one embedding / of H~ = H \ uv with 
f(x) = c, f(y) = d and f(u) = a. By definition of P[ we must have 

\Ucdl > (2e H )- 1 n~ 5t p-~ 1 /n 4eHt > n" 10 ^^- 1 
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(say) for every edge cd G P\. Next we need the following claim. 
Claim. \U c d H U c 'd'\ < n~ x l eH p~ x for any two edges cd,c'd' G P[. 

Proof. Consider two embeddings /i, f 2 of H~ such that fi(x) = c, fi(y) = d, f2(x) = c', f2(y) = d' 
and h(u) = f 2 {u) = a. Let C = fi(V H ) n f 3 (V H ) and H' = H \ {uv,xy}. Let W be the join of 
J\ = H' and J 2 = i?' formed by identifying the sets / 1 _1 (C") and f 2 ~ (C) as a single set C on which 
/1 and f 2 agree. Note that we have u G C. For ease of notation we let x, y denote the copies of x, y 
in J\ and x', y' the copies of x, y in J 2 . Let ^4 = {x, y} U {x', y'}. Since cd 7^ c'd' we have \A\ > 3. 
Define : A — > [n] by 0(x) = c, 0(y) = d, cft(x') = c', </>(y') = d'. We want to estimate N^yy. The 
argument is very similar to that in Lemma I6.1L Choose B with A C £? C W maximising 5b, w • 
We have cases depending on how Vj x and Vj 2 intersect. If /i(Vff) = /2(Vff), i-e. Vj x = Vj 2 , then 
we have Sb,w < S B H - < 1 < p^ 1 , since (xy,H~) is strictly balanced and |A| > 3. We henceforth 
suppose that fi(Vjj) 7^ j 2 (Vh) • Define B\ = B n Vj x and B 2 = B f] Vj 2 . Next we consider the 
case Vj 1 C Vj 2 U A If -B2 7^ {^' ; y'} then we have 5b, w < Sb 2 ,j 2 — 1 < VP because (x'y',J 2 ) 
is strictly balanced. If -B2 = {sc', y'} then we note that, since H has minimum degree at least 3 
and Vj 1 \ Vj 2 7^ 0, we have Sb,w < pSb 2 ,J 2 — 1 < VP- The analogous argument handles the case 
K/ 2 C V* U A. 

Now suppose that Vj 1 \{Vj 2 UA) and Vj 2 \(Vj 1 UA) are non-empty. We consider subcases according 
to B\ and B 2 . The first subcase is B\\JC 7^ Vj x . Then we have Sb^cJx = ^b^ch- < 1) since u G C 
and (xy,H~) is strictly balanced. Also Sb 2 ,j 2 < Sx'y',H- = P _1 > so 5b, w = Sb 2 ,j 2 Sb 1 uc,Ji ^ P 
The second subcase is B2 U C / Vj 2 , when a similar argument gives 5b,w = Sb 1 ,j 1 Sb 2 uc,J2 < P • 
Finally, the third subcase is i?i U C = Vj 1 and i?2 U C = Vj 2 . Then B\ contains Vj 1 \ {A\ U C) and 
B 2 contains Vj 2 \ (A 2 U C), which are both non-empty. Since (xy, -ff - ) is strictly balanced we have 
Sbi,Ji — 1 an d Sb 2 uc,J2 — 1; an d so 5b, W = Sb 1 ,j 1 Sb 2 uc,J2 — 1- I n an cases we have 5b,w < 
so 5b, w < n -1 ' ^""^p -1 , since it is an integer power of n" 1 /^ -1 ). Now Lemma 15.21 gives N^^w < 
n 4e^e-i/(eH— 1_ Summing over all possible joins we estimate 1?/^ fl t^ti'l < n -1 / 6 ^^ -1 , which 
proves the claim. □ 

Returning to the proof of the lemma, we now set u> = n lleHt and choose u> edges of Pi, say 
c\d\,--- jC^d^. Recall that \U c d\ > n ~ 10eiie p~ 1 for every cd G P[. Then \ ^j<iU c .dA > 

n -iOe H tp-i _ i n —i/eBp-l f or 1 < j < w by the claim. This gives 

I Uti UcA > uon- l0e ^p- 1 - i^n-V^p-l > jf P ~\ 

say. But by definition the sets U Ci di ar e contained in /, for which \I\ = a = 3fi~ 1 (logn) 1 ~ 1 /( eH ~ 1 ^p~ 1 
is too small. This contradiction shows that we cannot have \Pj\ > n _5<E p _1 for some / holding 
together with the bounds used from Lemma 15.21 These bounds hold with high probability, so with 
high probability we have \Pj\ < n _5<E p _1 for all /, i.e. H has the smooth independence property. □ 

The two arguments above can be generalised to prove smooth independence for a wider class of 
graphs H. However, for the sake of brevity and clarity, we restrict our attention to these simple cases 
here. We complete the discussion of cliques by showing that K§ has smooth independence. (The 
independence numbers for the i^-free and iQ-free processes have already been obtained in [7j.) 

Lemma 12.3 K§ has smooth independence. 
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Proof. Write H = K§. We argue as in the proof of Lemma 112.21 Consider /, Pj, uv, xy, Pj, 
U cc i as denned in that proof. Now (xy,H~) is not strictly balanced, but do we have S B H - < 1 for 
any B with xyu C -B C Vh, so for every cd E Pj we still obtain the bound \U c d\ > n~ 10eH€ p~ 1 . 
Following that proof, our next step is to show that \U c d D f7 c 'd'| < n~ 1 / £H p~ 1 for any two edges 
cd,c'd' £ Pj. In fact we will obtain a much stronger bound. Consider two embeddings /i,/2 of -ff 
such that = c, = d, f 2 {x) = c', / 2 (y) = d! and / x (it) = / 2 (u) = a. Define C", fl 7 , J x , 

J2, W, x', y', A and as before. Choose B with ^4 C B C W maximising Ss,w- Note that for 
any if with xy <^ K C Vh we have S KH ~ < 1. So if Vj x = Vj 2 we have < 1- Otherwise 

we consider cases according to B\ = B n Vj 2 and B2 = B (1 Vj 2 . Since = Sb 1 ,j 1 Sb 2 vjc,j 2 -> 

Sb,w = 5 , B 2 ,J2'5 , BiUC,Ji and U C|, |i?2 U C[ > 3 we see that Sb,w < 1> except possibly in the case 
B\ = {x,y} and B2 = {x',y'}. In this case we note that there is an edge from a € C to B2 that is 
not contained in J%, so Sb,w < ?Sb 1 ,j 1 = 1- In all cases we have Sb,w < 1 and so N^yy < n w . 
Summing over all possible joins W we estimate \U C( i n U c id'\ < n 5ew , say. Now the remainder of the 
proof follows as in Lemma 112.21 □ 



13 Concluding remarks 

We have restricted our attention in this paper to those aspects of the H-free process needed for our 
applications to Ramsey and Turan bounds. However, we also view this work as the first stage in the 
study of this process as a model of independent interest. In the course of our arguments we have 
already described some properties of the model via our asymptotic formulae for trackable extension 
variables; for example, we have shown that for fixed graphs T that do not contain H as a subgraph, 
excluding 'critical' cases, the number of copies of T in G(i) is roughly the same as the number of 
copies of r in the unconstrained random graph G(n,i). In principle, one may ask for analogues in 
the graph G{i) produced by the H-iree process of any property known to hold in G(n,i). But the 
most natural next steps are continued investigation of the independence number and development 
of upper bounds on the number of steps in the H-iree process. For independent sets, there are 
other classes of graphs covered by our methods, but for clarity we have restricted our attention to 
certain concrete settings rather than stating a complicated general theorem. One might hope that 
any strictly 2-balanced graph can be analysed by these methods. With respect to upper bounds, we 
believe that the number of steps in the H-iree process is at most a constant times the lower bound 
we establish here for any strictly 2-balanced H. In fact, we are even prepared to make this conjecture 
for the degree of each vertex. 

Conjecture 13.1 For any strictly 2-balanced graph H there is a constant C so that with high proba- 
bility the maximal H-free graph G on n vertices produced by the H-free process has maximum degree 

For the triangle- free process this follows from the bound on the independence number (see [7]), but 
in general it is a separate question. The later evolution of the process, where Theorem 11.41 no longer 
applies, is also an intriguing topic for further study. 
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